# Clustering with Distributed Data

@article{Kar2019ClusteringWD, title={Clustering with Distributed Data}, author={Soummya Kar and Brian Swenson}, journal={ArXiv}, year={2019}, volume={abs/1901.00214} }

We consider $K$-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked $K$-means, or $NK$-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-dimensional statistics and not raw data at the agents. The proposed approach develops a parametric…

## 5 Citations

### Consensus-Based Distributed Clustering for IoT

- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020

Experiments show that the proposed distributed clustering algorithm can offer the same convergence and clustering quality as its centralized counterpart but with less data traffic, and the proposed algorithms outperforms the existing methods.

### Gradient Based Clustering

- Computer ScienceICML
- 2022

A general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions, and shows that it converges to the set of appropriately deﬁned points, under arbitrary center initialization.

### A parallel ADMM-based convex clustering method

- Computer ScienceEURASIP Journal on Advances in Signal Processing
- 2022

This paper develops a parallel, ADMM-based method, for a modified convex clustering sum-of-norms (SON) formulation for master–worker architectures, and provides its efficient, open-source implementation for high-performance computing (HPC) cluster environments.

### Distributed Global Optimization by Annealing

- Computer Science2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
- 2019

A first-order consensus + innovations type algorithm that incorporates decaying additive Gaussian noise for annealing to converge to the set of global minima under certain technical assumptions is studied.

### Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem

- Mathematics, Computer Science2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2019

The paper develops an appropriate stable-manifold theorem for DGD that shows that convergence to saddle points may only occur from a low-dimensional stable manifold and implies that DGD almost always converges to local minima.

## References

SHOWING 1-10 OF 79 REFERENCES

### Distributed k-means algorithm

- Computer ScienceArXiv
- 2013

In this paper we provide a fully distributed implementation of the k-means clustering algorithm, intended for wireless sensor networks where each agent is endowed with a possibly high-dimensional…

### Robust Communication-Optimal Distributed Clustering Algorithms

- Computer ScienceICALP
- 2019

This work gives a matching $\Omega(sk+z)$ lower bound on the communication required both for approximating the optimal k-median or k-means objective value up to any constant, and for returning a clustering that is close to the target clustering in Hamming distance.

### Distributed optimization in sensor networks

- Computer ScienceThird International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004
- 2004

This paper investigates a general class of distributed algorithms for "in-network" data processing, eliminating the need to transmit raw data to a central point, and shows that for a broad class of estimation problems the distributed algorithms converge to within an /spl epsi/-ball around the globally optimal value.

### Approximate Distributed K-Means Clustering over a Peer-to-Peer Network

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2009

This paper offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm which are designed to operate in a dynamic P2P network that can produce clusterings by ldquolocalrdquo synchronization only.

### Randomized gossip algorithms

- Computer ScienceIEEE Transactions on Information Theory
- 2006

This work analyzes the averaging problem under the gossip constraint for an arbitrary network graph, and finds that the averaging time of a gossip algorithm depends on the second largest eigenvalue of a doubly stochastic matrix characterizing the algorithm.

### Distributed $k$ -Means Algorithm and Fuzzy $c$ -Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory

- Computer ScienceIEEE Transactions on Cybernetics
- 2017

Simulation results show that the proposed distributed algorithms can achieve almost the same results as that given by the centralized clustering algorithms.

### Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

- Computer ScienceJ. Parallel Distributed Comput.
- 2013

### General and Robust Communication-Efficient Algorithms for Distributed Clustering

- Computer ScienceArXiv
- 2017

This work gives a distributed approximation algorithm for k-means, k-median, or generally any `p objective, with z outliers and/or balance constraints, using O(m(k+ z)(d+ log n) bits of communication, where m is the number of machines, n is the size of the point set, and d is the dimension.

### Fast Distributed k-Center Clustering with Outliers on Massive Data

- Computer ScienceNIPS
- 2015

This work considers the widely used k-center clustering problem and its variant used to handle noisy data, k- center with outliers, and demonstrates how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance.