• Publications
  • Influence
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive. Expand
Stochastic dual coordinate ascent methods for regularized loss
A new analysis of Stochastic Dual Coordinate Ascent (SDCA) is presented showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD. Expand
A Proximal Stochastic Gradient Method with Progressive Variance Reduction
This work proposes and analyzes a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastics gradient. Expand
Nonlinear Learning using Local Coordinate Coding
It is shown that a high dimensional nonlinear function can be approximated by a global linear function with respect to this coding scheme, and the approximation quality is ensured by the locality of such coding. Expand
Analysis of Multi-stage Convex Relaxation for Sparse Regularization
  • Tong Zhang
  • Mathematics, Computer Science
  • J. Mach. Learn. Res.
  • 1 March 2010
A multi-stage convex relaxation scheme for solving problems with non-convex objective functions with sparse regularization is presented and it is shown that the local solution obtained by this procedure is superior to the global solution of the standard L1 conveX relaxation for learning sparse targets. Expand
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
A straightforward adaptation of CNN from image to text, a simple but new variation which employs bag-of-word conversion in the convolution layer is proposed and an extension to combine multiple convolution layers is explored for higher accuracy. Expand
Multi-Label Prediction via Compressed Sensing
It is shown that the number of subproblems need only be logarithmic in the total number of possible labels, making this approach radically more efficient than others. Expand
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size. Expand
Image Classification Using Super-Vector Coding of Local Image Descriptors
This paper introduces a new framework for image classification using local visual descriptors. The pipeline first performs a non-linear feature transformation on descriptors, then aggregates theExpand
Truncated power method for sparse eigenvalue problems
A strong sparse recovery result is proved for the truncated power method, and this theory is the key motivation for developing the new algorithm. Expand