• Corpus ID: 25517366

Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

@article{Wang2019ScalableKK,
  title={Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds},
  author={Shusen Wang and Alex Gittens and Michael W. Mahoney},
  journal={J. Mach. Learn. Res.},
  year={2019},
  volume={20},
  pages={12:1-12:49}
}
Kernel $k$-means clustering can correctly identify and extract a far more varied collection of cluster structures than the linear $k$-means clustering algorithm. However, kernel $k$-means clustering is computationally expensive when the non-linear feature map is high-dimensional and there are many input points. Kernel approximation, e.g., the Nystrom method, has been applied in previous works to approximately solve kernel learning problems when both of the above conditions are present. This… 
Kernel k-Means, By All Means: Algorithms and Strong Consistency
TLDR
This paper generalizes recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings and characterize the large sample behavior of the proposed method by establishing strong consistency guarantees.
On the optimality of kernels for high-dimensional clustering
TLDR
This paper considers the problem of high-dimensional Gaussian clustering and shows that, with the exponential kernel function, the sufficient conditions for partial recovery of clusters using the NP-hard kernel k-means objective matches the known information-theoretic limit up to a factor of $\sqrt{2}$ for large $k$.
Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization
TLDR
This paper employs incomplete Cholesky factorization to accelerate kernel clustering and save memory space, and shows both analytically and empirically that the performance of the proposed algorithm is similar to that of the kernel $k-means clustering algorithm, but the method can deal with large-scale datasets.
Refined Learning Bounds for Kernel and Approximate k-Means
Kernel k-means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order
Nearly Optimal Clustering Risk Bounds for Kernel K-Means
TLDR
A nearly optimal excess clustering risk bound is obtained and the statistical effect of computational approximations of the Nystrom kernel $k$-means is analyzed, and it is proved that it achieves the same statistical accuracy as the exact kernel £k-mean considering only $\Omega(\sqrt{nk})$ Nystrom landmark points.
Randomized Clustered Nystrom for Large-Scale Kernel Machines
TLDR
A novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank is presented and a randomized algorithm for generating landmark points that is scalable to large-scale data sets is introduced.
Coresets for Kernel Clustering
TLDR
The first coreset for kernel k-MEANS is devised, whose size is independent of the number of input points n, and moreover is constructed in time near-linear in n, which implies new algorithms for Kernel k- MEANS, such as a (1 + )-approximation in timeNearlinear inn, and a streaming algorithm using space and update time poly(k −1 log n).
Discrete Multiple Kernel k-means
TLDR
A novel Discrete Multiple Kernel k-means (DMKKM) model is elaborate by an optimization algorithm that directly obtains the cluster indicator matrix without subsequent discretization procedures and is capable of enhancing kernel fusion by reducing redundancy and improving diversity.
Nearly Optimal Risk Bounds for Kernel K-Means
TLDR
A nearly optimal excess risk bound for kernel (or approximate kernel) $k$-means is obtained, substantially improving the state-of-art bounds in the existing clustering risk analyses and the statistical effect of computational approximations of the Nystrom kernel is analyzed.
Gaussian kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters
TLDR
This paper presents Gaussian kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters and experiments using synthetic data sets and using UCI machine learning repository data sets corroborate the usefulness of the proposed algorithms.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 119 REFERENCES
Approximate kernel k-means: solution to large scale kernel clustering
TLDR
It is shown both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k -means algorithm, but with dramatically reduced run-time complexity and memory requirements.
Efficient Kernel Clustering Using Random Fourier Features
TLDR
This paper employs random Fourier maps, originally proposed for large scale classification, to accelerate kernel clustering and proposes an improved scheme which uses the top singular vectors of the transformed data matrix to perform clustering, and yields a better approximation ofkernel clustering under appropriate conditions.
Kernel k-means: spectral clustering and normalized cuts
TLDR
The generality of the weighted kernel k-means objective function is shown, and the spectral clustering objective of normalized cut is derived as a special case, leading to a novel weightedkernel k-Means algorithm that monotonically decreases the normalized cut.
Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction
  • Kai Zhang, J. Kwok
  • Mathematics, Computer Science
    IEEE Transactions on Neural Networks
  • 2010
TLDR
The (non-probabilistic) error analysis justifies a “clustered Nyström method” that uses the k-means clustering centers as landmark points and can be applied to scale up a wide variety of algorithms that depend on the eigenvalue decomposition of kernel matrix.
On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications
  • K. Chen
  • Mathematics, Computer Science
    SIAM J. Comput.
  • 2009
TLDR
These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space.
Randomized Dimensionality Reduction for $k$ -Means Clustering
TLDR
The first provably accurate feature selection method for k-means clustering is presented and, in addition, two feature extraction methods are presented that improve upon the existing results in terms of time complexity and number of features needed to be extracted.
Randomized Clustered Nystrom for Large-Scale Kernel Machines
TLDR
A novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank is presented and a randomized algorithm for generating landmark points that is scalable to large-scale data sets is introduced.
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
TLDR
This work shows how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error, and gives a simple alternative to known algorithms that has applications in the streaming setting.
A local search approximation algorithm for k-means clustering
TLDR
This work considers the question of whether there exists a simple and practical approximation algorithm for k-means clustering, and presents a local improvement heuristic based on swapping centers in and out that yields a (9+ε)-approximation algorithm.
The Hardness of Approximation of Euclidean k-Means
TLDR
The first hardness of approximation for the Euclidean $k-means problem is provided via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle- free graph, the goal is to choose the fewest number of vertices which are incident on all the edges.
...
1
2
3
4
5
...