• Corpus ID: 199452961

Online Detection of Sparse Changes in High-Dimensional Data Streams Using Tailored Projections

  title={Online Detection of Sparse Changes in High-Dimensional Data Streams Using Tailored Projections},
  author={Martin Tveten and Ingrid Kristine Glad},
  journal={arXiv: Methodology},
When applying principal component analysis (PCA) for dimension reduction, the most varying projections are usually used in order to retain most of the information. For the purpose of anomaly and change detection, however, the least varying projections are often the most important ones. In this article, we present a novel method that automatically tailors the choice of projections to monitor for sparse changes in the mean and/or covariance matrix of high-dimensional data. A subset of the least… 

Data Invariants: On Trust in Data-Driven Systems

It is empirically show that data invariants can reliably detect tuples on which the prediction of a machine-learned model should not be trusted, and quantify data drift more accurately than the state-of-the-art methods.

Conformance Constraint Discovery: Measuring Trust in Data-Driven Systems

It is empirically show that conformance constraints offer mechanisms to reliably detect tuples on which the inference of a machine-learned model should not be trusted, and quantify data drift more accurately than the state of the art.



Which principal components are most sensitive to distributional changes

PCA is often used in anomaly detection and statistical process control tasks. For bivariate data, we prove that the minor projection (the least varying projection) of the PCA-rotated data is the most

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

  • Y. WangY. Mei
  • Computer Science
    IEEE Transactions on Information Theory
  • 2015
This work proposes a systematic approach to develop efficient global monitoring schemes for quickest change detection by combining hard thresholding with linear shrinkage estimators to estimating all post-change parameters simultaneously.

A PCA-Based Change Detection Framework for Multidimensional Data Streams: Change Detection in Multidimensional Data Streams

This paper proposes a framework for detecting changes in multidimensional data streams based on principal component analysis, which is used for projecting data into a lower dimensional space, thus facilitating density estimation and change-score calculations and has advantages over existing approaches.

Optimal sequential detection in multi-stream data

  • H. Chan
  • Computer Science, Mathematics
  • 2015
This work shows how the (optimal) detection delay depends on the fraction of data streams undergoing distribution changes as the number of detectors goes to infinity, and shows that the optimal detection delay is achieved by the sum of detectability score transformations of either the partial scores or CUSUM scores of the data streams.

PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data

This work proposes to apply principal component analysis (PCA) for feature extraction prior to the change detection of changes in multidimensional unlabeled data and shows that feature extraction through PCA is beneficial, specifically for data with multiple balanced classes.

In-Network PCA and Anomaly Detection

A PCA-based anomaly detector in which adaptive local data filters send to a coordinator just enough data to enable accurate global detection is developed, based on a stochastic matrix perturbation analysis that characterizes the tradeoff between the accuracy of anomaly detection and the amount of data communicated over the network.

Sequential multi-sensor change-point detection

  • Yao XieD. Siegmund
  • Mathematics
    2013 Information Theory and Applications Workshop (ITA)
  • 2013
We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one

Statistical Learning Methods Applied to Process Monitoring: An Overview and Perspective

An overview of the current state of data-driven multivariate statistical process monitoring methodology is given and some of the monitoring and surveillance techniques informed by data mining techniques that show promise for monitoring large and diverse data sets are highlighted.

A systematic comparison of PCA-based statistical process monitoring methods for high-dimensional, time-dependent processes

These fundamental methods will be systematically compared on high-dimensional, time-dependent processes to provide practitioners with guidelines for appropriate monitoring strategies and a sense of how they can be expected to perform.

An Adaptive Sampling Strategy for Online High-Dimensional Process Monitoring

A monitoring scheme of using the sum of top-r local CUSUM statistics is developed and named as “TRAS” (top-r based adaptive sampling), which is scalable and robust in detecting a wide range of possible mean shifts in all directions, when each data stream follows a univariate normal distribution.