Automatic Learning of Predictive CEP Rules: Bridging the Gap between Data Mining and Complex Event Processing
We study the problem of identifying discriminative features in Big Data arising from heterogeneous sensors. We highlight the heterogeneity in sensor data from engineering applications and the challenges involved in automatically extracting only the most interesting features from large datasets. We formulate this problem as that of classification of multivariate time series and design shapelet-based algorithms for this task. We design a novel approach, called Shapelet Forests (SF), which combines shapelet extraction with feature selection. We evaluate our proposed method with other approaches for mining shapelets from multivariate time series using data from real-world engineering applications. Quantitative analysis of the experiments shows that SF performs better than the baseline approaches and achieves high classification accuracy. In addition, the method enables identification of noisy sensors from multivariate data and discounts their use for classification.