• Corpus ID: 239998786

As easy as APC: overcoming missing data and class imbalance in time series with self-supervised learning

  title={As easy as APC: overcoming missing data and class imbalance in time series with self-supervised learning},
  author={Fiorella Wever and T. Anderson Keller and Victor Garcia and Laura Symul},
High levels of missing data and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. Existing methods approach these problems separately, frequently making significant assumptions about the underlying data generation process in order to lessen the impact of missing information. In this work, we instead demonstrate how a general self-supervised training method, namely Autoregressive Predictive Coding (APC), can be leveraged to… 


Recurrent Neural Networks for Multivariate Time Series with Missing Values
Novel deep learning models are developed based on Gated Recurrent Unit, a state-of-the-art recurrent neural network that takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results.
BRITS: Bidirectional Recurrent Imputation for Time Series
BRITS is a novel method based on recurrent neural networks for missing value imputation in time series data that directly learns the missing values in a bidirectional recurrent dynamical system, without any specific assumption.
Time series cluster kernels to exploit informative missingness and incomplete label information
This work creates a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited, and proposes a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities.
Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
Experimental results clearly show that the unsupervised pre-training approach improves the performance of deep LSTM and leads to better and faster convergence than other models.
Strategies for learning in class imbalance problems
A set of examples or training set (TS) is said to be imbalanced if one of the classes is represented by a very small number of cases compared to the other classes. Following the common practice
Dual autoencoders features for imbalance classification problem
This work proposes the first feature learning-based method for dealing with imbalance pattern classification using stacked autoencoders and shows that the DAF outperforms current resampling-based methods with statistical significance for imbalanced pattern classification problems.
A Survey of Predictive Modelling under Imbalanced Distributions
The main challenges raised by imbalanced distributions are discussed, the main approaches to these problems are described, a taxonomy of these methods is proposed and some related problems within predictive modelling are referred to.
SMOTE: Synthetic Minority Over-sampling Technique
A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.
Cluster-based under-sampling approaches for imbalanced data distributions
Cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class are proposed and the experimental results show that these approaches outperform the other under-Sampling techniques in the previous studies.
Class Weights Random Forest Algorithm for Processing Class Imbalanced Medical Data
The validation test on UCI data sets demonstrates that for imbalanced medical data, the proposed method enhanced the overall performance of the classifier while producing high accuracy in identifying both majority and minority class.