• Corpus ID: 208310367

LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data

  title={LSAR: Efficient Leverage Score Sampling Algorithm for the Analysis of Big Time Series Data},
  author={Ali Eshragh and Fred Roosta and Asef Nazari and Michael W. Mahoney},
  journal={arXiv: Methodology},
We apply methods from randomized numerical linear algebra (RandNLA) to develop improved algorithms for the analysis of large-scale time series data. We first develop a new fast algorithm to estimate the leverage scores of an autoregressive (AR) model in big data regimes. We show that the accuracy of approximations lies within $(1+\mathcal{O}(\varepsilon))$ of the true leverage scores with high probability. These theoretical results are subsequently exploited to develop an efficient algorithm… 

Figures from this paper

Rollage: Efficient Rolling Average Algorithm to Estimate ARMA Models for Big Time Series Data
We develop a new method to estimate the order of an AR model in the presence of big time series data. Using the concept of a rolling average, we develop a new efficient algorithm, called Rollage, to
Toeplitz Least Squares Problems, Fast Algorithms and Big Data
This work investigates and compares the quality of these two approximation algorithms on largescale synthetic and real-world data and concludes that RandNLA is effective in the context of big-data time series.
Augmented Tensor Decomposition with Stochastic Alternating Optimization
Tensor decompositions are powerful tools for dimensionality reduction and feature interpretation of multidimensional data such as signals. Existing tensor decomposition objectives (e.g., Frobenius
Augmented Tensor Decomposition with Stochastic Optimization
Tensor decompositions are powerful tools for dimensionality reduction and feature interpretation of multidimensional data such as signals. Existing tensor decomposition objectives (e.g., Frobenius
Surprise Maximization: A Dynamic Programming Approach
Borwein et al. [1] solved a “surprise maximization” problem by applying results from convex analysis and mathematical programming. Although, their proof is elegant, it requires advanced knowledge
MTC: Multiresolution Tensor Completion from Partial and Coarse Observations
The proposed Multi-resolution Tensor Completion model (MTC) explores tensor mode properties and leverages the hierarchy of resolutions to recursively initialize an optimization setup, and optimizes on the coupled system using alternating least squares to ensure low computational and space complexity.
Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition
This work presents an application of randomized numerical linear algebra to fitting the CP decomposition of sparse tensors, solving a significantly smaller sampled least squares problem at each iteration with probabilistic guarantees on the approximation errors.


Online adaptive lasso estimation in vector autoregressive models for high dimensional wind power forecasting
This paper proposes a time-adaptive lasso estimator and an efficient coordinate descent algorithm for updating the VAR model parameters recursively online and shows good abilities to track changes in the multivariate time series dynamics on simulated data.
Information-Based Optimal Subdata Selection for Big Data Linear Regression
Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude, and the advantages of the new approach are also illustrated through analysis of real data.
A statistical perspective on algorithmic leveraging
This work provides an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model and shows that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other.
Fast approximation of matrix coherence and statistical leverage
A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores.
The Importance of Environmental Factors in Forecasting Australian Power Demand
We develop a time series model to forecast weekly peak power demand for three main states of Australia for a yearly timescale, and show the crucial role of environmental factors in improving the
Randomized Algorithms for Matrices and Data
This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.
Low-Rank Approximation and Regression in Input Sparsity Time
We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥2 = (1 ± ε)∥Ax∥2 simultaneously for all x ∈ Rd. Here, m is
Assessing stochastic algorithms for large scale nonlinear least squares problems using extremal probabilities of linear combinations of gamma random variables
This paper proposes eight variants of a practical randomized algorithm where the uncertainties in the major stochastic steps are quantified, and proves tight necessary and sufficient conditions on the sample size to satisfy the prescribed probabilistic accuracy.
A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle
This paper models occasional, discrete shifts in the growth rate of a nonstationary series. Algorithms for inferring these unobserved shifts are presented, a byproduct of which permits estimation of
Demand forecasting in the presence of systematic events: Cases in capturing sales promotions
This paper develops and test a novel regime-switching approach to quantify systematic information/events and objectively incorporate them into the baseline statistical model and indicates that the proposed model can successfully improve the forecast accuracy when compared to the current industry practice which heavily relies on human judgment to factor in all types of information/ Events.