Estimating leverage scores via rank revealing methods and randomization

  title={Estimating leverage scores via rank revealing methods and randomization},
  author={Aleksandros Sobczyk and Efstratios Gallopoulos},
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms. We first develop a set of fast novel algorithms for rank estimation, column subset selection and least squares preconditioning. We then describe the design and implementation of leverage score estimators based on these primitives… 

A quantum-inspired algorithm for approximating statistical leverage scores

This work proposes a quantum-inspired algorithm for approximating the statistical leverage scores of a matrix A and shows that this algorithm takes time polynomial in an integer k, condition number κ and logarithm of the matrix size.

pylspack: Parallel algorithms and data structures for sketching, column subset selection, regression and leverage scores

This work presents parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra, with a special focus on “tall-and-skinny” matrices, which arise in many applications.

Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss

An algorithm to estimate the Euclidean lengths of the rows of a matrix and proves element-wise probabilistic bounds that are at least as good as standard JL approximations in the worst-case, but are asymptotically better for matrices with decaying spectrum.



Revisiting the Nystrom Method for Improved Large-scale Machine Learning

An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.

An Empirical Evaluation of Sketched SVD and its Application to Leverage Score Ordering

This work presents Sketched Leverage Score Ordering, a technique for determining the ordering of data in the training of neural networks based on the distributed computation of leverage scores using random projections, which is faster compared to standard randomized projection algorithms and shows improvements in convergence and results.

Input Sparsity Time Low-rank Approximation via Ridge Leverage Score Sampling

We present a new algorithm for finding a near optimal low-rank approximation of a matrix $A$ in $O(nnz(A))$ time. Our method is based on a recursive sampling scheme for computing a representative

Provable deterministic leverage score sampling

This work provides a novel theoretical analysis of deterministic leverage score sampling and shows that such sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay.

Fast approximation of matrix coherence and statistical leverage

A randomized algorithm is proposed that takes as input an arbitrary n × d matrix A, with n ≫ d, and returns, as output, relative-error approximations to all n of the statistical leverage scores.

Probabilistic Leverage Scores for Parallelized Unsupervised Feature Selection

The use of Probabilistic PCA is proposed to compute the leverage scores in O(mnk) time, enabling the applicability of some of these randomized methods to large, high-dimensional data sets and offering a parallelized version over the emerging Resilient Distributed Datasets paradigm on Apache Spark.

Augmented Leverage Score Sampling with Bounds

An empirical evaluation of the proposed augmented leverage score performance on the column subsample selection problem (CSSP) as compared to the traditional leverage score and other methods in both a deterministic and probabilistic sampling paradigm is presented.

An Empirical Evaluation of Sketching for Numerical Linear Algebra

This work investigates least squares regression, iteratively reweighted least squares, logistic regression, robust regression with Huber and Bisquare loss functions, leverage score computation, Frobenius norm low rank approximation, and entrywise $\ell_1$-low rank approximation.

Tighter Low-rank Approximation via Sampling the Leveraged Element

This work proposes a new randomized algorithm for computing a low-rank approximation to a given matrix that combines the best aspects of otherwise disparate current results, but with a dependence on the condition number κ = σ1/σr.

Iterative Row Sampling

This work shows that alternating between computing a short matrix estimate and finding more accurate approximate leverage scores leads to a series of geometrically smaller instances that gives an algorithm whose runtime is input sparsity plus an overhead comparable to the cost of solving a regression problem on the smaller approximation.