Clustering Large Graphs via the Singular Value Decomposition

@article{Drineas2004ClusteringLG,
  title={Clustering Large Graphs via the Singular Value Decomposition},
  author={Petros Drineas and Alan M. Frieze and Ravi Kannan and Santosh S. Vempala and V. Vinay},
  journal={Machine Learning},
  year={2004},
  volume={56},
  pages={9-33}
}
We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional… 
Approximating K-means-type Clustering via Semidefinite Programming
TLDR
This paper first model MSSC as a so-called 0-1 semidefinite programming (SDP) problem, and shows that this model provides a unified framework for several clustering approaches such as normalized k-cut and spectral clustering.
Graph partitioning into isolated, high conductance clusters: theory, computation and applications to preconditioning
TLDR
It is shown how decompositions of n vertices into a collection P of vertex disjoint clusters such that, for all clusters C ε P, the graph induced by the vertices in C and the edges leaving C, has conductance bounded below by φ.
Spectral Clustering by Ellipsoid and Its Connection to Separable Nonnegative Matrix Factorization
TLDR
A variant of the normalized cut algorithm for spectral clustering that applies the K-means algorithm to the eigenvectors of a normalized graph Laplacian for finding clusters, which shows that the algorithm shares similarity with the ellipsoidal rounding algorithm for separable nonnegative matrix factorization.
Global optimality in k-means clustering
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
TLDR
This work shows how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error, and gives a simple alternative to known algorithms that has applications in the streaming setting.
k-Means: Outliers-Resistant Clustering+++
TLDR
This work generalizes k-means++ to support outliers in two sense: (i) nonmetric spaces, e.g., M-estimators, where the distance dist(p,x) between a point p and a center x is replaced by mindist( p,x),c for an appropriate constant c that may depend on the scale of the input.
The seeding algorithms for spherical k-means clustering
TLDR
This paper mainly study seeding algorithms for spherical k-means clustering, for its special case (with separable sets), as well as for its generalized problem ($$\alpha $$α-spherical k- means clustered).
Goal Clustering: VNS based heuristics
TLDR
Two Variable Neighborhood Search (VNS) based heuristics are presented, based on the Ward's construction and the k-means method, for the GC problem, characterizing a different methodology within unsupervised non-hierarchical clustering.
Exact Algorithms of Search for a Cluster of the Largest Size in Two Integer 2-Clustering Problems
TLDR
Both problems of searching for a subset in a finite set of points in Euclidean space are shown to be strongly NP-hard, and exact algorithms for the problems in which the input points have integer components are presented.
Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
TLDR
Going beyond Bregman divergences, this paper proves an approximation factor for tensor clustering with arbitrary separable metrics, and derives the first (to the knowledge) guaranteed methods for these increasingly important clustering settings.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
A Constant-Factor Approximation Algorithm for the k-Median Problem
TLDR
This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal.
Polynomial time approximation schemes for geometric k-clustering
  • R. Ostrovsky, Y. Rabani
  • Computer Science
    Proceedings 41st Annual Symposium on Foundations of Computer Science
  • 2000
TLDR
This work deals with the problem of clustering data points, and gives polynomial time approximation schemes for this problem in several settings, including the binary cube (0, 1)/sup d/ with Hamming distance, and R/Sup d/ either with L/sup 1/ distance, or with L /sup 2/ distance.
The analysis of a simple k-means clustering algorithm
TLDR
This paper presents a simple and efficient implementation of Lloyd's k-means clustering algorithm, which it differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points.
Polynomial-time approximation schemes for geometric min-sum median clustering
TLDR
The setransformations are used to solve NP-hard clustering problems in the cube as well as in geometric settings, and it is shown that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube.
Fast Monte-Carlo algorithms for approximate matrix multiplication
  • P. Drineas, R. Kannan
  • Computer Science
    Proceedings 2001 IEEE International Conference on Cluster Computing
  • 2001
Given an m ? n matrix A and an n ? p matrix B, we present 2 simple and intuitive algorithms to compute an approximation P to the product A ? B, with provable bounds for the norm of the "error matrix"
Two algorithms for nearest-neighbor search in high dimensions
TLDR
A new approach to the nearest-neighbor problem is developed, based on a method for combining randomly chosen one-dimensional projections of the underlying point set, which results in an algorithm for finding e-approximate nearest neighbors with a query time of O((d log d)(d + log n)).
The eigenvalues of random symmetric matrices
TLDR
It is shown that with probability 1-o(1)all eigenvalues belong to the above intervalI if μ=0, while in case μ>0 only the largest eigenvalueλ1 is outsideI, and λ1 asymptotically has a normal distribution with expectation (n−1)μ+v+(σ2/μ) and variance 2σ2 (bounded variance!).
Improved combinatorial algorithms for the facility location and k-median problems
  • M. Charikar, S. Guha
  • Computer Science
    40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
  • 1999
TLDR
Improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems and a 4-approximation for the k- median problem are presented, building on the 6- approximation of Jain and Vazirani.
Fast computation of low rank matrix approximations
TLDR
It is proved that the effect of sampling and quantization nearly vanishes when a low rank approximation to <italic>A+E</italic>, whose entries are independent random variables with zero-mean and bounded variance is computed.
Sampling lower bounds via information theory
TLDR
A novel technique, based on the Jensen-Shannon divergence from information theory, is presented to prove lower bounds on the query complexity of sampling algorithms that approximate functions over arbitrary domain and range, which gives stronger bounds for functions that possess a large set of inputs.
...
...