Clustering Large Graphs via the Singular Value Decomposition

@article{Drineas2004ClusteringLG,
  title={Clustering Large Graphs via the Singular Value Decomposition},
  author={Petros Drineas and A. Frieze and R. Kannan and S. Vempala and V. Vinay},
  journal={Machine Learning},
  year={2004},
  volume={56},
  pages={9-33}
}
We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional… Expand
Approximating K-means-type Clustering via Semidefinite Programming
TLDR
This paper first model MSSC as a so-called 0-1 semidefinite programming (SDP) problem, and shows that this model provides a unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Expand
Graph partitioning into isolated, high conductance clusters: theory, computation and applications to preconditioning
TLDR
It is shown how decompositions of n vertices into a collection P of vertex disjoint clusters such that, for all clusters C ε P, the graph induced by the vertices in C and the edges leaving C, has conductance bounded below by φ. Expand
Spectral Clustering by Ellipsoid and Its Connection to Separable Nonnegative Matrix Factorization
TLDR
A variant of the normalized cut algorithm for spectral clustering that applies the K-means algorithm to the eigenvectors of a normalized graph Laplacian for finding clusters, which shows that the algorithm shares similarity with the ellipsoidal rounding algorithm for separable nonnegative matrix factorization. Expand
Global optimality in k-means clustering
TLDR
A new algorithm is provided, which reduces both the exponent and the constant factor, to the extent that it becomes feasible for relevant particular cases, and parallelizes extremely well, so that its implementation on current high-performance hardware is quite straightforward. Expand
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
TLDR
This work shows how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error, and gives a simple alternative to known algorithms that has applications in the streaming setting. Expand
The seeding algorithms for spherical k-means clustering
TLDR
This paper mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem ($$\alpha $$α-spherical k- means clustered). Expand
k-Means: Outliers-Resistant Clustering+++
TLDR
This work generalizes k-means++ to support outliers in two sense: (i) nonmetric spaces, e.g., M-estimators, where the distance dist(p,x) between a point p and a center x is replaced by mindist( p,x),c for an appropriate constant c that may depend on the scale of the input. Expand
Goal Clustering: VNS based heuristics
TLDR
Two Variable Neighborhood Search (VNS) based heuristics are presented, based on the Ward's construction and the k-means method, for the GC problem, characterizing a different methodology within unsupervised non-hierarchical clustering. Expand
Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems
We consider two related discrete optimization problems of searching for a subset in a finite set of points in Euclidean space. Both problems are induced by versions of a fundamental problem in dataExpand
Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
TLDR
Going beyond Bregman divergences, this paper proves an approximation factor for tensor clustering with arbitrary separable metrics, and derives the first (to the knowledge) guaranteed methods for these increasingly important clustering settings. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
A Constant-Factor Approximation Algorithm for the k-Median Problem
TLDR
This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal. Expand
Polynomial time approximation schemes for geometric k-clustering
  • R. Ostrovsky, Y. Rabani
  • Computer Science, Mathematics
  • Proceedings 41st Annual Symposium on Foundations of Computer Science
  • 2000
TLDR
This work deals with the problem of clustering data points, and gives polynomial time approximation schemes for this problem in several settings, including the binary cube (0, 1)/sup d/ with Hamming distance, and R/Sup d/ either with L/sup 1/ distance, or with L /sup 2/ distance. Expand
The analysis of a simple k-means clustering algorithm
TLDR
This paper presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. Expand
Polynomial-time approximation schemes for geometric min-sum median clustering
TLDR
The setransformations are used to solve NP-hard clustering problems in the cube as well as in geometric settings, and it is shown that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. Expand
Fast Monte-Carlo algorithms for finding low-rank approximations
  • A. Frieze, R. Kannan, S. Vempala
  • Mathematics, Computer Science
  • Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)
  • 1998
TLDR
This paper develops an algorithm which is qualitatively faster provided the entries of the matrix are sampled according to a natural probability distribution and the algorithm takes time polynomial in k, 1//spl epsiv/, log(1//spl delta/) only, independent of m, n. Expand
Fast Monte-Carlo algorithms for approximate matrix multiplication
Given an m ? n matrix A and an n ? p matrix B, we present 2 simple and intuitive algorithms to compute an approximation P to the product A ? B, with provable bounds for the norm of the "error matrix"Expand
Two algorithms for nearest-neighbor search in high dimensions
TLDR
A new approach to the nearest-neighbor problem is developed, based on a method for combining randomly chosen one-dimensional projections of the underlying point set, which results in an algorithm for finding e-approximate nearest neighbors with a query time of O((d log d)(d + log n)). Expand
The eigenvalues of random symmetric matrices
TLDR
It is shown that with probability 1-o(1)all eigenvalues belong to the above intervalI if μ=0, while in case μ>0 only the largest eigenvalueλ1 is outsideI, and λ1 asymptotically has a normal distribution with expectation (n−1)μ+v+(σ2/μ) and variance 2σ2 (bounded variance!). Expand
Improved combinatorial algorithms for the facility location and k-median problems
  • M. Charikar, S. Guha
  • Mathematics, Computer Science
  • 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
  • 1999
TLDR
Improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems and a 4-approximation for the k- median problem are presented, building on the 6- approximation of Jain and Vazirani. Expand
Fast computation of low rank matrix approximations
TLDR
It is proved that the effect of sampling and quantization nearly vanishes when a low rank approximation to <italic>A+E</italic>, whose entries are independent random variables with zero-mean and bounded variance is computed. Expand
...
1
2
3
4
...