Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- John C. Duchi, Elad Hazan, Y. Singer
- Computer ScienceJournal of machine learning research
- 1 February 2011
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
- John C. Duchi, Alekh Agarwal, M. Wainwright
- Computer ScienceIEEE Transactions on Automatic Control
- 12 May 2010
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.
Unlabeled Data Improves Adversarial Robustness
- Y. Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi
- Computer ScienceNeural Information Processing Systems
- 31 May 2019
It is proved that unlabeled data bridges the complexity gap between standard and robust classification: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy.
Local privacy and statistical minimax rates
- John C. Duchi, Michael I. Jordan, M. Wainwright
- Computer Science, MathematicsAllerton Conference on Communication, Control…
- 13 February 2013
Borders on information-theoretic quantities that influence estimation rates as a function of the amount of privacy preserved can be viewed as quantitative data-processing inequalities that allow for precise characterization of statistical rates under local privacy constraints.
Efficient Online and Batch Learning Using Forward Backward Splitting
The two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1, l2, l22, and l∞ regularization, and is extended and given efficient implementations for very high-dimensional data with sparsity.
Efficient projections onto the l1-ball for learning in high dimensions
- John C. Duchi, S. Shalev-Shwartz, Y. Singer, Tushar Chandra
- Computer ScienceInternational Conference on Machine Learning
- 5 July 2008
Efficient algorithms for projecting a vector onto the l1-ball are described and variants of stochastic gradient projection methods augmented with these efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques.
Distributed delayed stochastic optimization
This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays.
Certifying Some Distributional Robustness with Principled Adversarial Training
- Aman Sinha, Hongseok Namkoong, John C. Duchi
- Computer ScienceInternational Conference on Learning…
- 29 October 2017
This work provides a training procedure that augments model parameter updates with worst-case perturbations of training data and efficiently certify robustness for the population loss by considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball.
Generalizing to Unseen Domains via Adversarial Data Augmentation
- Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John C. Duchi, Vittorio Murino, S. Savarese
- Computer Science, MathematicsNeural Information Processing Systems
- 1 May 2018
This work proposes an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the current model, and shows that the method is an adaptive data augmentation method where the authors append adversarial examples at each iteration.
Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations
- John C. Duchi, Michael I. Jordan, M. Wainwright, Andre Wibisono
- Computer Science, MathematicsIEEE Transactions on Information Theory
- 7 December 2013
Focusing on nonasymptotic bounds on convergence rates, it is shown that if pairs of function values are available, algorithms for d-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods.