• Corpus ID: 57373874

Clustering with Distributed Data

  title={Clustering with Distributed Data},
  author={Soummya Kar and Brian Swenson},
We consider $K$-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked $K$-means, or $NK$-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-dimensional statistics and not raw data at the agents. The proposed approach develops a parametric… 

Consensus-Based Distributed Clustering for IoT

Experiments show that the proposed distributed clustering algorithm can offer the same convergence and clustering quality as its centralized counterpart but with less data traffic, and the proposed algorithms outperforms the existing methods.

Gradient Based Clustering

A general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions, and shows that it converges to the set of appropriately defined points, under arbitrary center initialization.

A parallel ADMM-based convex clustering method

This paper develops a parallel, ADMM-based method, for a modified convex clustering sum-of-norms (SON) formulation for master–worker architectures, and provides its efficient, open-source implementation for high-performance computing (HPC) cluster environments.

Distributed Global Optimization by Annealing

A first-order consensus + innovations type algorithm that incorporates decaying additive Gaussian noise for annealing to converge to the set of global minima under certain technical assumptions is studied.

Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem

The paper develops an appropriate stable-manifold theorem for DGD that shows that convergence to saddle points may only occur from a low-dimensional stable manifold and implies that DGD almost always converges to local minima.



Distributed k-means algorithm

In this paper we provide a fully distributed implementation of the k-means clustering algorithm, intended for wireless sensor networks where each agent is endowed with a possibly high-dimensional

Robust Communication-Optimal Distributed Clustering Algorithms

This work gives a matching $\Omega(sk+z)$ lower bound on the communication required both for approximating the optimal k-median or k-means objective value up to any constant, and for returning a clustering that is close to the target clustering in Hamming distance.

Distributed optimization in sensor networks

  • M. RabbatR. Nowak
  • Computer Science
    Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004
  • 2004
This paper investigates a general class of distributed algorithms for "in-network" data processing, eliminating the need to transmit raw data to a central point, and shows that for a broad class of estimation problems the distributed algorithms converge to within an /spl epsi/-ball around the globally optimal value.

Approximate Distributed K-Means Clustering over a Peer-to-Peer Network

This paper offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm which are designed to operate in a dynamic P2P network that can produce clusterings by ldquolocalrdquo synchronization only.

Randomized gossip algorithms

This work analyzes the averaging problem under the gossip constraint for an arbitrary network graph, and finds that the averaging time of a gossip algorithm depends on the second largest eigenvalue of a doubly stochastic matrix characterizing the algorithm.

Distributed $k$ -Means Algorithm and Fuzzy $c$ -Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory

Simulation results show that the proposed distributed algorithms can achieve almost the same results as that given by the centralized clustering algorithms.

Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

General and Robust Communication-Efficient Algorithms for Distributed Clustering

This work gives a distributed approximation algorithm for k-means, k-median, or generally any `p objective, with z outliers and/or balance constraints, using O(m(k+ z)(d+ log n) bits of communication, where m is the number of machines, n is the size of the point set, and d is the dimension.

Fast Distributed k-Center Clustering with Outliers on Massive Data

This work considers the widely used k-center clustering problem and its variant used to handle noisy data, k- center with outliers, and demonstrates how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance.