Exploiting Variant-Based Parallelism for Data Mining of Space Weather Phenomena

Abstract

This paper studies a form of parallelism termed variant-based parallelism, which exploits commonalities and reuse among variant computations in order to improve multithreading scalability. The problem is motivated by space weather studies that aim to identify changes in the Earth's ionosphere caused by auroral activity, tsunamis, and earthquakes. Today it is common to execute cluster algorithm variants with different parameters in order to determine which ones best explain phenomena in empirical data. We propose a novel approach and a set of optimizations to maximize throughput in such clustering algorithms. This is achieved by executing multiple clustering algorithm variants in parallel and developing efficient approaches to concurrently cluster data and maximize the reuse of results from completed variants. We present evaluations on real-world space weather datasets with up to 5 million ionospheric total electron content data points as well as synthetic datasets with up to a million data points. Results show a 1101% performance improvement due to indexing tailored for variant-based clustering, and a 2209% performance improvement when applying all of our proposed optimizations. Our optimizations enable new approaches in computer-aided discovery and could enable the short run times required for early warning systems for natural hazards.

DOI: 10.1109/IPDPS.2016.10

13 Figures and Tables

Cite this paper

@article{Gowanlock2016ExploitingVP, title={Exploiting Variant-Based Parallelism for Data Mining of Space Weather Phenomena}, author={Michael G. Gowanlock and David M. Blair and Victor Pankratius}, journal={2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, year={2016}, pages={760-769} }