• Publications
  • Influence
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TLDR
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight. Expand
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
TLDR
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork. Expand
Efficient projections onto the l1-ball for learning in high dimensions
TLDR
Efficient algorithms for projecting a vector onto the l1-ball are described and variants of stochastic gradient projection methods augmented with these efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. Expand
Efficient Online and Batch Learning Using Forward Backward Splitting
TLDR
The two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1, l2, l22, and l∞ regularization, and is extended and given efficient implementations for very high-dimensional data with sparsity. Expand
Local privacy and statistical minimax rates
TLDR
Borders on information-theoretic quantities that influence estimation rates as a function of the amount of privacy preserved can be viewed as quantitative data-processing inequalities that allow for precise characterization of statistical rates under local privacy constraints. Expand
Unlabeled Data Improves Adversarial Robustness
TLDR
It is proved that unlabeled data bridges the complexity gap between standard and robust classification: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Expand
Distributed delayed stochastic optimization
TLDR
This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays. Expand
Certifying Some Distributional Robustness with Principled Adversarial Training
TLDR
This work provides a training procedure that augments model parameter updates with worst-case perturbations of training data and efficiently certify robustness for the population loss by considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball. Expand
Composite Objective Mirror Descent
TLDR
This work presents a new method for regularized convex optimization that unifies previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, and derives specific instantiations of this method for commonly used regularization functions,such as l1, mixed norm, and trace-norm. Expand
Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates
TLDR
It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples. Expand
...
1
2
3
4
5
...