Robust change detection for large-scale data streams

  title={Robust change detection for large-scale data streams},
  author={Ruizhi Zhang and Yajun Mei and JianJun Shi},
  journal={Sequential Analysis},
  pages={1 - 19}
Abstract Robust change point detection for large-scale data streams has many real-world applications in industrial quality control, signal detection, and biosurveillance. Unfortunately, it is highly nontrivial to develop efficient schemes due to three challenges: (1) the unknown sparse subset of affected data streams, (2) the unexpected outliers, and (3) computational scalability for real-time monitoring and detection. In this article, we develop a family of efficient real-time robust detection… 


Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation
  • Y. Wang, Y. Mei
  • Computer Science
    IEEE Transactions on Information Theory
  • 2015
This work proposes a systematic approach to develop efficient global monitoring schemes for quickest change detection by combining hard thresholding with linear shrinkage estimators to estimating all post-change parameters simultaneously.
Scalable SUM-Shrinkage Schemes for Distributed Monitoring Large-Scale Data Streams
This article proposes to develop scalable global monitoring schemes by parallel running local detection procedures and by combining these local procedures together to make a global decision based on SUM-shrinkage techniques.
Efficient scalable schemes for monitoring a large number of data streams
A family of scalable schemes is proposed based on the sum of the local cumulative sum, cusum, statistics from each individual data stream, and is shown to asymptotically minimize the detection delays for each and every possible combination of affected data streams, subject to the global false alarm constraint.
Optimal sequential detection in multi-stream data
  • H. Chan
  • Computer Science, Mathematics
  • 2015
This work shows how the (optimal) detection delay depends on the fraction of data streams undergoing distribution changes as the number of detectors goes to infinity, and shows that the optimal detection delay is achieved by the sum of detectability score transformations of either the partial scores or CUSUM scores of the data streams.
Sequential multi-sensor change-point detection
  • Yao Xie, D. Siegmund
  • Mathematics
    2013 Information Theory and Applications Workshop (ITA)
  • 2013
We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one
Asymptotic statistical properties of communication-efficient quickest detection schemes in sensor networks
This work develops scalable communication-efficient schemes based on the sum of those local cumulative sum statistics that are “large” under either hard, soft, or order thresholding rules and illustrates the deep connections between communication efficiency and statistical efficiency.
Asymptotic Statistical Properties of Communication-Efficient Quickest Detection Schemes in Sensor Networks
This work develops scalable communication-efficient schemes based on the sum of those local CUSUM statistics that are “large” under either hard, soft, or order thresholding rules and establishes their asymptotic statistical properties under two regimes.
Efficient Computer Network Anomaly Detection by Changepoint Detection Methods
A novel score-based multi-cyclic detection algorithm based on the Shiryaev-Roberts procedure, which is as easy to employ in practice and as computationally inexpensive as the popular Cumulative Sum chart and the Exponentially Weighted Moving Average scheme is proposed.
Asymptotically Optimal Quickest Change Detection in Distributed Sensor Systems
This paper presents asymptotically optimal decentralized quickest change detection procedures for two scenarios, and considers the minimax, uniform, and Bayesian versions of the optimization problem, and presents simulation results for examples involving Gaussian and Poisson observations.
Statistical Challenges Facing Early Outbreak Detection in Biosurveillance
This work focuses mainly on the monitoring of time series to provide early alerts of anomalies to stimulate investigation of potential outbreaks, with a brief summary of methods to detect significant spatial and spatiotemporal case clusters.