# Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

@article{Wang2019ScalableKK, title={Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds}, author={Shusen Wang and Alex Gittens and Michael W. Mahoney}, journal={J. Mach. Learn. Res.}, year={2019}, volume={20}, pages={12:1-12:49} }

Kernel $k$-means clustering can correctly identify and extract a far more varied collection of cluster structures than the linear $k$-means clustering algorithm. However, kernel $k$-means clustering is computationally expensive when the non-linear feature map is high-dimensional and there are many input points. Kernel approximation, e.g., the Nystrom method, has been applied in previous works to approximately solve kernel learning problems when both of the above conditions are present. This…

## Figures, Tables, and Topics from this paper

## 63 Citations

Kernel k-Means, By All Means: Algorithms and Strong Consistency

- Computer Science, MathematicsArXiv
- 2020

This paper generalizes recent results leveraging a general family of means to combat sub-optimal local solutions to the kernel and multi-kernel settings and characterize the large sample behavior of the proposed method by establishing strong consistency guarantees.

On the optimality of kernels for high-dimensional clustering

- Mathematics, Computer ScienceAISTATS
- 2020

This paper considers the problem of high-dimensional Gaussian clustering and shows that, with the exponential kernel function, the sufficient conditions for partial recovery of clusters using the NP-hard kernel k-means objective matches the known information-theoretic limit up to a factor of $\sqrt{2}$ for large $k$.

Fast Kernel k-means Clustering Using Incomplete Cholesky Factorization

- Computer Science, MathematicsAppl. Math. Comput.
- 2021

This paper employs incomplete Cholesky factorization to accelerate kernel clustering and save memory space, and shows both analytically and empirically that the performance of the proposed algorithm is similar to that of the kernel $k-means clustering algorithm, but the method can deal with large-scale datasets.

Refined Learning Bounds for Kernel and Approximate k-Means

- 2021

Kernel k-means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order…

Nearly Optimal Clustering Risk Bounds for Kernel K-Means

- Computer Science
- 2020

A nearly optimal excess clustering risk bound is obtained and the statistical effect of computational approximations of the Nystrom kernel $k$-means is analyzed, and it is proved that it achieves the same statistical accuracy as the exact kernel £k-mean considering only $\Omega(\sqrt{nk})$ Nystrom landmark points.

Randomized Clustered Nystrom for Large-Scale Kernel Machines

- Mathematics, Computer ScienceAAAI
- 2018

A novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank is presented and a randomized algorithm for generating landmark points that is scalable to large-scale data sets is introduced.

Coresets for Kernel Clustering

- Computer ScienceArXiv
- 2021

The first coreset for kernel k-MEANS is devised, whose size is independent of the number of input points n, and moreover is constructed in time near-linear in n, which implies new algorithms for Kernel k- MEANS, such as a (1 + )-approximation in timeNearlinear inn, and a streaming algorithm using space and update time poly(k −1 log n).

Discrete Multiple Kernel k-means

- Computer ScienceIJCAI
- 2021

A novel Discrete Multiple Kernel k-means (DMKKM) model is elaborate by an optimization algorithm that directly obtains the cluster indicator matrix without subsequent discretization procedures and is capable of enhancing kernel fusion by reducing redundancy and improving diversity.

Nearly Optimal Risk Bounds for Kernel K-Means

- Computer Science, MathematicsArXiv
- 2020

A nearly optimal excess risk bound for kernel (or approximate kernel) $k$-means is obtained, substantially improving the state-of-art bounds in the existing clustering risk analyses and the statistical effect of computational approximations of the Nystrom kernel is analyzed.

Gaussian kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters

- Computer SciencePattern Recognit.
- 2018

This paper presents Gaussian kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters and experiments using synthetic data sets and using UCI machine learning repository data sets corroborate the usefulness of the proposed algorithms.

## References

SHOWING 1-10 OF 119 REFERENCES

Approximate kernel k-means: solution to large scale kernel clustering

- Computer Science, MathematicsKDD
- 2011

It is shown both analytically and empirically that the performance of approximate kernel k-means is similar to that of the kernel k -means algorithm, but with dramatically reduced run-time complexity and memory requirements.

Efficient Kernel Clustering Using Random Fourier Features

- Computer Science, Mathematics2012 IEEE 12th International Conference on Data Mining
- 2012

This paper employs random Fourier maps, originally proposed for large scale classification, to accelerate kernel clustering and proposes an improved scheme which uses the top singular vectors of the transformed data matrix to perform clustering, and yields a better approximation ofkernel clustering under appropriate conditions.

Kernel k-means: spectral clustering and normalized cuts

- Mathematics, Computer ScienceKDD
- 2004

The generality of the weighted kernel k-means objective function is shown, and the spectral clustering objective of normalized cut is derived as a special case, leading to a novel weightedkernel k-Means algorithm that monotonically decreases the normalized cut.

Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction

- Mathematics, Computer ScienceIEEE Transactions on Neural Networks
- 2010

The (non-probabilistic) error analysis justifies a “clustered Nyström method” that uses the k-means clustering centers as landmark points and can be applied to scale up a wide variety of algorithms that depend on the eigenvalue decomposition of kernel matrix.

On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications

- Mathematics, Computer ScienceSIAM J. Comput.
- 2009

These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space.

Randomized Dimensionality Reduction for $k$ -Means Clustering

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2015

The first provably accurate feature selection method for k-means clustering is presented and, in addition, two feature extraction methods are presented that improve upon the existing results in terms of time complexity and number of features needed to be extracted.

Randomized Clustered Nystrom for Large-Scale Kernel Machines

- Mathematics, Computer ScienceAAAI
- 2018

A novel algorithm to compute the optimal Nystrom low-approximation when the number of landmark points exceed the target rank is presented and a randomized algorithm for generating landmark points that is scalable to large-scale data sets is introduced.

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

- Mathematics, Computer ScienceSTOC
- 2015

This work shows how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+ε) error, and gives a simple alternative to known algorithms that has applications in the streaming setting.

A local search approximation algorithm for k-means clustering

- Computer Science, MathematicsSCG '02
- 2002

This work considers the question of whether there exists a simple and practical approximation algorithm for k-means clustering, and presents a local improvement heuristic based on swapping centers in and out that yields a (9+ε)-approximation algorithm.

The Hardness of Approximation of Euclidean k-Means

- Computer Science, MathematicsSoCG
- 2015

The first hardness of approximation for the Euclidean $k-means problem is provided via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle- free graph, the goal is to choose the fewest number of vertices which are incident on all the edges.