# Quantized Compressive K-Means

@article{Schellekens2018QuantizedCK, title={Quantized Compressive K-Means}, author={V. Schellekens and L. Jacques}, journal={IEEE Signal Processing Letters}, year={2018}, volume={25}, pages={1211-1215} }

The recent framework of compressive statistical learning proposes to design tractable learning algorithms that use only a heavily compressed representation—or sketch—of massive datasets. Compressive K-Means (CKM) is such a method: It aims at estimating the centroids of data clusters from pooled, nonlinear, and random signatures of the learning examples. While this approach significantly reduces computational time on very large datasets, its digital implementation wastes acquisition resources… Expand

#### 12 Citations

Asymmetric compressive learning guarantees with applications to quantized sketches

- Computer Science, Mathematics
- ArXiv
- 2021

This work proves that the existing guarantees carry over to this asymmetric scheme, up to a controlled error term, provided some Limited Projected Distortion (LPD) property holds. Expand

Differentially Private Compressive K-means

- Computer Science
- ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019

This work modified the standard sketching mechanism to provide differential privacy, using addition of Laplace noise combined with a subsampling mechanism (each moment is computed from a subset of the dataset) to deal with the large scale of datasets. Expand

Compressive learning with privacy guarantees

- Computer Science
- 2021

This work shows that a simple perturbation of this mechanism with additive noise is sufficient to satisfy differential privacy, a well established formalism for defining and quantifying the privacy of a random mechanism. Expand

Sketched clustering via hybrid approximate message passing

- Computer Science, Mathematics
- 2017 51st Asilomar Conference on Signals, Systems, and Computers
- 2017

A cluster recovery algorithm based on simplified hybrid generalized approximate message passing (SHyGAMP) is proposed, which is more efficient than the state-of-the-art sketched clustering algorithms (in both computational and sample complexity) and moreefficient than k-means++ in certain regimes. Expand

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

- Computer Science
- 2020

Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. Expand

Making AI Forget You: Data Deletion in Machine Learning

- Computer Science, Mathematics
- NeurIPS
- 2019

This paper proposes two provably efficient deletion algorithms which achieve an average of over 100X improvement in deletion efficiency across 6 datasets, while producing clusters of comparable statistical quality to a canonical k-means++ baseline. Expand

Compressive Classification (Machine Learning without learning)

- Mathematics, Computer Science
- ArXiv
- 2018

A compressive learning classification method, and a novel sketch function for images, that combines supervised and unsupervised learning methods for compressed learning of images. Expand

A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost

- Computer Science, Medicine
- Entropy
- 2020

The present study proposed a method of cutting down clustering calculation costs by applying an initial center point approach based on space division and outliers so that no objects would be subordinate to the initial cluster center for dependence lower from theInitial cluster center. Expand

A Study on Correlation Analysis and Application of Communication Network Service

- Computer Science
- 2018 4th International Conference on Universal Village (UV)
- 2018

A correlation analysis of mobile data, including a correlation analysis between a user’s mobile network business and district, between business and time, andbetween business and business is conducted, useful for achieving the optimization of resource allocation and communication network service provided by operators, contributing to the better guidance of urban planning. Expand

SoyNet: Soybean leaf diseases classification

- Computer Science
- Comput. Electron. Agric.
- 2020

A computer vision approach for plant diseases classification using deep learning convolution neural network, SoyNet, for soybean plant diseases recognition using segmented leaf images that outperforms nine state-of-the-art methods/models. Expand

#### References

SHOWING 1-10 OF 38 REFERENCES

Compressive K-means

- Computer Science, Mathematics
- 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017

This work proposes a compressive version of K-means, that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset, and demonstrates empirically that CKM performs similarly to Lloyd-Max, for a sketch size proportional to the number of centroids times the ambient dimension, and independent of the size of the original dataset. Expand

Compressive Statistical Learning with Random Feature Moments

- Computer Science, Mathematics
- ArXiv
- 2017

A general framework --compressive statistical learning-- for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch that captures the information relevant to the considered learning task. Expand

Sketching for large-scale learning of mixture models

- Computer Science, Mathematics
- 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016

This work proposes a "compressive learning" framework where first sketch the data by computing random generalized moments of the underlying probability distribution, then estimate mixture model parameters from the sketch using an iterative algorithm analogous to greedy sparse signal recovery. Expand

Orthogonal Matching Pursuit with Replacement

- Computer Science, Mathematics
- NIPS
- 2011

This paper proposes a novel partial hard-thresholding operator that leads to a general family of iterative algorithms that includes Orthogonal Matching Pursuit with Replacement (OMPR), and extends OMPR using locality sensitive hashing to get OMPR-Hash, the first provably sub-linear algorithm for sparse recovery. Expand

K-Means Algorithm Over Compressed Binary Data

- Computer Science
- 2018 Data Compression Conference
- 2018

A network of binary-valued sensors with a fusion center is considered and it is shown that applying the K-means algorithm directly over the compressed data without reconstructing the original sensors measurements enables to recover the clusters of the original domain. Expand

SketchMLbox -- A MATLAB toolbox for large-scale mixture learning

- Computer Science
- 2018

The SketchMLbox is a Matlab toolbox for fitting mixture models to large databases using sketching techniques and is structured so that new mixture models can be easily implemented. Expand

Representation and Coding of Signal Geometry

- Mathematics, Computer Science
- ArXiv
- 2015

This paper considers randomized embeddings as an encoding mechanism and provides a framework to analyze their performance, and demonstrates that it is possible to design the embedding such that it represents different ranges of distances with different precision. Expand

An Introduction To Compressive Sampling

- Computer Science
- IEEE Signal Processing Magazine
- 2008

The theory of compressive sampling, also known as compressed sensing or CS, is surveyed, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. Expand

Large-Scale High-Dimensional Clustering with Fast Sketching

- Computer Science
- 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018

To cope with high-dimensional datasets, it is shown how to use fast structured random matrices to compute the sketching operator efficiently, and the clustering results are shown to be much more stable, both on artificial and real datasets. Expand

Stable signal recovery from incomplete and inaccurate measurements

- Physics, Mathematics
- 2005

Suppose we wish to recover a vector x_0 Є R^m (e.g., a digital signal or image) from incomplete and contaminated observations y = Ax_0 + e; A is an n by m matrix with far fewer rows than columns (n «… Expand