A Proximal Stochastic Gradient Method with Progressive Variance Reduction
This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient.
Stochastic dual coordinate ascent methods for regularized loss
A new analysis of Stochastic Dual Coordinate Ascent (SDCA) is presented showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.
Nonlinear Learning using Local Coordinate Coding
It is shown that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding.
Analysis of Multi-stage Convex Relaxation for Sparse Regularization
- Tong Zhang
- Computer ScienceJournal of machine learning research
- 1 March 2010
A multi-stage convex relaxation scheme for solving problems with non-convex objective functions with sparse regularization is presented and it is shown that the local solution obtained by this procedure is superior to the global solution of the standard L1 conveX relaxation for learning sparse targets.
A Spectral Algorithm for Learning Hidden Markov Models
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
- O. Shamir, Nathan Srebro, Tong Zhang
- Computer ScienceInternational Conference on Machine Learning
- 30 December 2013
A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size.
Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization
The runtime of the framework is analyzed and rates that improve state-of-the-art results for various key machine learning optimization problems including SVM, logistic regression, ridge regression, Lasso, and multiclass SVM are obtained.
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
The performance of SGD without non-trivial smoothness assumptions is investigated, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy, and a new and simple averaging scheme is proposed which not only attains optimal rates, but can also be easily computed on-the-fly.
Multi-Label Prediction via Compressed Sensing
It is shown that the number of subproblems need only be logarithmic in the total number of possible labels, making this approach radically more efficient than others.
A tail inequality for quadratic forms of subgaussian random vectors
An exponential probability tail inequality for positive semidefinite quadratic forms in a subgaussian random vector is proved, analogous to one that holds when the vector has independent Gaussian entries.