• Publications
  • Influence
StopWasting My Gradients: Practical SVRG
TLDR
This work shows how to exploit support vectors to reduce the number of gradient computations in the later iterations of stochastic variance-reduced gradient methods and proves that the commonly-used regularized SVRG iteration is justified and improves the convergence rate. Expand
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields
TLDR
This work describes a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, proposes a non-uniform sampling scheme that substantially improves practical performance, and analyzes the rate of convergence of the SAGA variant under non- uniform sampling. Expand
M-ADDA: Unsupervised Domain Adaptation with Deep Metric Learning
TLDR
It is shown that M-ADDA performs significantly better on the digits adaptation datasets of MNIST and USPS, suggesting that using metric-learning for domain adaptation can lead to large improvements in classification accuracy for the domain adaptation task. Expand
A Generic Top-N Recommendation Framework for Trading-Off Accuracy, Novelty, and Coverage
TLDR
This work presents an approach that relies on historical rating data to learn user long-tail novelty preferences and integrates these preferences into a generic re-ranking framework that customizes balance between accuracy and coverage, and empirically validate that this proposed framework increases the novelty of recommendations. Expand
Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions
TLDR
This work presents a new stochastic method for variational inference which exploits the geometry of the variational-parameter space and also yields simple closed-form updates even for non-conjugate models and gives a convergence-rate analysis of this method. Expand
Reducing the variance in online optimization by transporting past gradients
TLDR
The idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly is proposed which yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal. Expand
Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence
TLDR
This work introduces a new stochastic-approximation method that uses a proximal-gradient framework and establishes the convergence of the method under a "non-decreasing" step-size schedule, which has both theoretical and practical advantages. Expand
OPT2020: 12th Annual Workshop on Optimization for Machine Learning Infinite-Dimensional Game Optimization via Variational Transport
Game optimization has been extensively studied when decision variables lie in a finite-dimensional space, of which solutions correspond to pure strategies at the Nash equilibrium (NE) and theExpand
To Each Optimizer a Norm, To Each Norm its Generalization
TLDR
It is proved that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions, and techniques to bias optimizers towards better generalizing solutions are proposed, improving their test performance. Expand
...
1
2
...