Online Data Thinning via Multi-Subspace Tracking

  title={Online Data Thinning via Multi-Subspace Tracking},
  author={Xin Jiang Hunt and Rebecca M. Willett},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  • X. Hunt, R. Willett
  • Published 2019
  • Mathematics, Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in data centers. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low… Expand
L1-Subspace Tracking for Streaming Data
The superiority of the proposed L1-subspace tracking method compared to existing approaches is demonstrated through experimental studies in various application fields. Expand
Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams.
A novel sparse representation-based DSC algorithm, called evolutionary dynamic sparse subspace clustering (EDSSC), which can cope with the time-varying nature of subspaces underlying the evolving data streams, such as subspace emergence, disappearance, and recurrence. Expand
Streaming PCA and Subspace Tracking: The Missing Data Case
It is illustrated that streaming PCA and subspace tracking algorithms can be understood through algebraic and geometric perspectives, and they need to be adjusted carefully to handle missing data. Expand
Real-Time Nonparametric Anomaly Detection in High-Dimensional Settings
This work model anomalies as persistent outliers and propose to detect them via a cumulative sum-like algorithm via an asymptotic lower bound and an ascyptotic approximation for the average false alarm period of the proposed algorithm. Expand
Sparse Subspace Clustering for Evolving Data Streams
  • Jinping Sui, Zhen Liu, +4 authors X. Li
  • Computer Science
  • ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
This paper proposes a sparse-based DSC algorithm, which it refers to as dynamic sparse subspace clustering (D-SSC), which recovers the low-dimensional subspaces (structures) of high-dimensional data streams and finds an explicit assignment of points to subspace in an online manner. Expand
Online Robust Principal Component Analysis With Change Point Detection
This paper develops an efficient online robust PCA method, namely, online moving window robust principal component analysis (OMWRPCA). Expand
Clustering-Enhanced Stochastic Gradient MCMC for Hidden Markov Models with Rare States
This work proposes to use a preliminary clustering to over-sample the rare clusters and reduce variance in gradient estimation within Stochastic Gradient MCMC to demonstrate very substantial gains in predictive and inferential accuracy on real and synthetic examples. Expand


PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares From Partial Observations
The proposed algorithm, dubbed Parallel Estimation and Tracking by REcursive Least Squares (PETRELS), first identifies the underlying low-dimensional subspace, and then reconstructs the missing entries via least-squares estimation if required, comparing PETRELS with state of the art batch algorithms. Expand
PETRELS: Subspace estimation and tracking from partial observations
The proposed algorithm, called PETRELS, identifies the underlying low-dimensional subspace via a recursive procedure for each row of the subspace matrix in parallel, and then reconstructs the missing entries via least-squares estimation if required. Expand
Change-Point Detection for High-Dimensional Time Series With Missing Data
The approach described in this paper leverages several recent results in the field of high-dimensional data analysis, including subspace tracking with missing data, multiscale analysis techniques for point clouds, online optimization, and change-point detection performance analysis. Expand
Group-sparse subspace clustering with missing data
Two novel methods for subspace clustering with missing data are described: (a) group-sparse sub- space clustering (GSSC), which is based on group-sparsity and alternating minimization, and (b) mixture subspace clusters (MSC) which models each data point as a convex combination of its projections onto all subspaces in the union. Expand
The past few years have witnessed an explosion in the availability of data from multiple sources and modalities. For example, millions of cameras have been installed in buildings, streets, airportsExpand
Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance
A novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods. Expand
Detecting anomalies in cross-classified streams: a Bayesian approach
This research was motivated by the need to extract critical application information and business intelligence from the daily logs that accompany large-scale spoken dialog systems and an empirical Bayes method which works by fitting a two-component Gaussian mixture to deviations at current time. Expand
Sparse Subspace Clustering: Algorithm, Theory, and Applications
  • Ehsan Elhamifar, R. Vidal
  • Computer Science, Mathematics
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2013
This paper proposes and studies an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces, and demonstrates the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. Expand
Sparse subspace clustering
We propose a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on theExpand
Fast mining of distance-based outliers in high-dimensional datasets
RBRP is presented, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets, and demonstrates that it outperforms the state-of-the-art algorithm, often by an order of magnitude. Expand