• Corpus ID: 37647537

Parallel K-Means Clustering with Triangle Inequality

@inproceedings{Krohn2016ParallelKC,
  title={Parallel K-Means Clustering with Triangle Inequality},
  author={Rachel Krohn and Christer Karlsson and Rachel Krohn},
  year={2016}
}
Clustering divides data objects into groups to minimize the variation within each group. This technique is widely used in data mining and other areas of computer science. K-means is a partitional clustering algorithm that produces a fixed number of clusters through an iterative process. The relative simplicity and obvious data parallelism of the K-means algorithm make it an excellent candidate for distributed-memory parallel optimization, particularly as datasets grow beyond the size of a… 

Figures from this paper

Analyzing Digital Evidence Using Parallel k-means with Triangle Inequality on Spark

Experimental results show that the parallel implementation of k-meansTI on Spark can be faster than the Spark ML k-Means when a data set is large, does not contain many sparse data, and is high dimensional.

Efficient k-means Using Triangle Inequality on Spark for Cyber Security Analytics

This paper re-formulates the parallel version of Elkan's k-means with triangle inequality (k-meanTI) algorithm, implements this algorithm on Apache Spark, and uses it to classify Web attacks in different clusters and provides the speed comparison of the parallel k-MeansTI on Spark with the Spark ML k- means clustering algorithm.

References

SHOWING 1-9 OF 9 REFERENCES

Parallel K-Means Algorithm for Shared Memory Multiprocessors

The aim of this work is to provide theoretical analysis on the performance of k-means algorithm and presenting extensive tests on a shared memory architecture.

Improvement and parallelism of k-means clustering algorithm

A Parallel Clustering Algorithm with MPI - MKmeans

A parallel K-means clustering algorithm with MPI, called MKmeans, is proposed, which enables applying the clustering algorithms effectively in the parallel environment and is demonstrated to be relatively stable and portable.

Fast and exact out-of-core and distributed k-means clustering

This paper presents a new algorithm, called fast and exact k-means clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset and provably produces the same cluster centres as reported by the original k-Means algorithm.

Parallel K-Means Clustering Based on MapReduce

This paper proposes a parallel k -means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique and demonstrates that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

A Parallel Implementation of K-Means Clustering on GPUs

This paper introduces a first step towards building an efficient GPU-based parallel implementation of a commonly used clustering algorithm called K-Means on an NVIDIA G80 PCI express graphics board using the CUDA processing extensions.

An Introduction to Data Mining

A Data-Clustering Algorithm on Distributed Memory Multiprocessors

To cluster increasingly massive data sets that are common today in data and text mining, a parallel implementation of the k-means clustering algorithm based on the message passing model is proposed and analytically shows that the speedup and the scaleup of the algorithm approach the optimal as the number of data points increases.