A MapReduce-based parallel K-means clustering for large-scale CIM data verification


The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems and marketing systems. With an rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows which makes verification of CIM data a challenging issue. This paper presents a parallel K-means for large scale CIM data verification based on the MapReduce computing model which has been widely taken up by the community in dealing with data intensive applications. By distributing the CIM data into a number of computers in a MapReduce cluster environment, the computation in CIM data verification is significantly improved. Furthermore, a load balancing scheme is designed to balance the workloads among the heterogeneous MapReduce computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means clustering in CIM data verification is first evaluated in a small scale experimental MapReduce cluster and subsequently evaluated in a large scale simulation environment.

DOI: 10.1002/cpe.3580

3 Figures and Tables

Cite this paper

@article{Deng2016AMP, title={A MapReduce-based parallel K-means clustering for large-scale CIM data verification}, author={Chuang Deng and Yang Liu and Lixiong Xu and Jie Yang and Junyong Liu and Siguang Li and Maozhen Li}, journal={Concurrency and Computation: Practice and Experience}, year={2016}, volume={28}, pages={3096-3114} }