Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions

  title={Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions},
  author={Alekh Agarwal and Sahand N. Negahban and Martin J. Wainwright},
  journal={2014 48th Annual Conference on Information Sciences and Systems (CISS)},
Summary form only given. Stochastic optimization algorithms have many desirable features for large-scale machine learning, and accordingly have been the focus of renewed and intensive study in the last several years (e.g., see the papers [2], [5], [14] and references therein). The empirical efficiency of these methods is backed with strong theoretical guarantees, providing sharp bounds on their convergence rates. These convergence rates are known to depend on the structure of the underlying… 

Figures from this paper

Sample average approximation with sparsity-inducing penalty for high-dimensional stochastic programming

It is shown that, if an FCP-regularized SAA formulation is solved locally, then the required number of samples can be significantly reduced in approximating the global solution of a convex SP: the sample size is only required to be poly-logarithmic in the number of dimensions.

Efficient online algorithms for fast-rate regret bounds under sparsity

New risk bounds are established that are adaptive to the sparsity of the problem and to the regularity of the risk (ranging from a rate 1 / $\sqrt T$ for general convex risk to 1 /T for strongly convexrisk) and generalize previous works on sparse online learning.

Sparse recovery by reduced variance stochastic approximation

A multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective and how these lead to parameter estimates which obey the best known to us accuracy bounds is shown.

Robust methods for high-dimensional linear learning

A statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting, where the number of features d may exceed the sample size n, and a comparison to other recent approaches proposed in the literature is compared.

Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization

This paper proposes an efficient online algorithm based on simultaneously learning a posterior probability of class and learning an optimal threshold by minimizing a stochastic strongly convex function with unknown strong convexity parameter that is provably faster than its predecessor based on a heuristic for updating the threshold.

The Statistics of Streaming Sparse Regression

This work presents a sparse analogue to stochastic gradient descent that is guaranteed to perform well under similar conditions to the lasso, and substantially out-performs existing streaming algorithms on both real and simulated data.

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Matrix Decomposition

Experiments show that for both sparse optimization and matrix decomposition problems, the multi-step version of the stochastic ADMM method outperforms the state-of-the-art methods.

Sparse Learning with Stochastic Composite Optimization

This paper proposes a simple yet effective two-phase Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastics Optimization algorithms.

Statistical inference for model parameters in stochastic gradient descent

This work investigates the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions, and proposes two consistent estimators of the asymptotic covariance of the average iterate from SGD.

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis

This paper aims to provide a sharp excess risk guarantee for learning a sparse linear model without any assumptions about the strong convexity of the expected loss and the sparsity of the optimal



Fast global convergence rates of gradient methods for high-dimensional statistical recovery

The theory guarantees that Nesterov's first-order method has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter θ* and the optimal solution ^θ.

Fast global convergence of gradient methods for high-dimensional statistical recovery

The theory guarantees that projected gradient descent has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter $\theta^*$ and an optimal solution $\hat{\theta}$.

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

This work is able to both analyze the statistical error associated with any global optimum, and prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers.

Restricted Eigenvalue Properties for Correlated Gaussian Designs

This paper proves directly that the restricted nullspace and eigenvalue conditions hold with high probability for quite general classes of Gaussian matrices for which the predictors may be highly dependent, and hence restricted isometry conditions can be violated with high probabilities.

A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.

Efficient Online and Batch Learning Using Forward Backward Splitting

The two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1, l2, l22, and l∞ regularization, and is extended and given efficient implementations for very high-dimensional data with sparsity.

Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms

A multistage AC-SA algorithm is introduced, which possesses an optimal rate of convergence for solving strongly convex SCO problems in terms of the dependence on not only the target accuracy, but also a number of problem parameters and the selection of initial points.

Gradient methods for minimizing composite objective function

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.

Robust Stochastic Approximation Approach to Stochastic Programming

It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.