Parallel K-Means Clustering with Triangle Inequality
@inproceedings{Krohn2016ParallelKC, title={Parallel K-Means Clustering with Triangle Inequality}, author={Rachel Krohn and Christer Karlsson and Rachel Krohn}, year={2016} }
Clustering divides data objects into groups to minimize the variation within each group. This technique is widely used in data mining and other areas of computer science. K-means is a partitional clustering algorithm that produces a fixed number of clusters through an iterative process. The relative simplicity and obvious data parallelism of the K-means algorithm make it an excellent candidate for distributed-memory parallel optimization, particularly as datasets grow beyond the size of a…
2 Citations
Analyzing Digital Evidence Using Parallel k-means with Triangle Inequality on Spark
- Computer Science2018 IEEE International Conference on Big Data (Big Data)
- 2018
Experimental results show that the parallel implementation of k-meansTI on Spark can be faster than the Spark ML k-Means when a data set is large, does not contain many sparse data, and is high dimensional.
Efficient k-means Using Triangle Inequality on Spark for Cyber Security Analytics
- Computer ScienceProceedings of the ACM International Workshop on Security and Privacy Analytics - IWSPA '19
- 2019
This paper re-formulates the parallel version of Elkan's k-means with triangle inequality (k-meanTI) algorithm, implements this algorithm on Apache Spark, and uses it to classify Web attacks in different clusters and provides the speed comparison of the parallel k-MeansTI on Spark with the Spark ML k- means clustering algorithm.
References
SHOWING 1-9 OF 9 REFERENCES
Parallel K-Means Algorithm for Shared Memory Multiprocessors
- Computer Science
- 2014
The aim of this work is to provide theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.
A Parallel Clustering Algorithm with MPI - MKmeans
- Computer ScienceJ. Comput.
- 2013
A parallel K-means clustering algorithm with MPI, called MKmeans, is proposed, which enables applying the clustering algorithms effectively in the parallel environment and is demonstrated to be relatively stable and portable.
Fast and exact out-of-core and distributed k-means clustering
- Computer ScienceKnowledge and Information Systems
- 2005
This paper presents a new algorithm, called fast and exact k-means clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset and provably produces the same cluster centres as reported by the original k-Means algorithm.
Parallel K-Means Clustering Based on MapReduce
- Computer ScienceCloudCom
- 2009
This paper proposes a parallel k -means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique and demonstrates that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.
A Parallel Implementation of K-Means Clustering on GPUs
- Computer SciencePDPTA
- 2008
This paper introduces a first step towards building an efficient GPU-based parallel implementation of a commonly used clustering algorithm called K-Means on an NVIDIA G80 PCI express graphics board using the CUDA processing extensions.
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
- Computer ScienceLarge-Scale Parallel Data Mining
- 1999
To cluster increasingly massive data sets that are common today in data and text mining, a parallel implementation of the k-means clustering algorithm based on the message passing model is proposed and analytically shows that the speedup and the scaleup of the algorithm approach the optimal as the number of data points increases.
Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets
- Environmental ScienceICCS
- 2011