On the Convergence of Adam and Beyond
- Sashank J. Reddi, Satyen Kale, Sanjiv Kumar
- Computer ScienceInternational Conference on Learning…
- 15 February 2018
It is shown that one cause for such failures is the exponential moving average used in the algorithms, and suggested that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients.
Hashing with Graphs
- W. Liu, Jun Wang, Sanjiv Kumar, Shih-Fu Chang
- Computer ScienceInternational Conference on Machine Learning
- 28 June 2011
This paper proposes a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes and describes a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy.
Adaptive Federated Optimization
- Sashank J. Reddi, Zachary B. Charles, H. B. McMahan
- Computer ScienceInternational Conference on Learning…
- 29 February 2020
This work proposes federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyzes their convergence in the presence of heterogeneous data for general nonconvex settings to highlight the interplay between client heterogeneity and communication efficiency.
Semi-Supervised Hashing for Large-Scale Search
- Jun Wang, Sanjiv Kumar, Shih-Fu Chang
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 1 December 2012
This work proposes a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets and presents three different semi- supervised hashing methods, including orthogonal hashing, nonorthogonal hash, and sequential hashing.
A New Baseline for Image Annotation
- A. Makadia, V. Pavlovic, Sanjiv Kumar
- Computer ScienceEuropean Conference on Computer Vision
- 12 October 2008
This work introduces a new baseline technique for image annotation that treats annotation as a retrieval problem and outperforms the current state-of-the-art methods on two standard and one large Web dataset.
Face tracking and recognition with visual constraints in real-world videos
- Minyoung Kim, Sanjiv Kumar, V. Pavlovic, H. Rowley
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 23 June 2008
This work addresses the problem of tracking and recognizing faces in real-world, noisy videos using a tracker that adaptively builds a target model reflecting changes in appearance, typical of a video setting and introduces visual constraints using a combination of generative and discriminative models in a particle filtering framework.
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
- Yang You, Jing Li, Cho-Jui Hsieh
- Computer ScienceInternational Conference on Learning…
- 1 April 2019
The empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning, and the optimizer enables use of very large batch sizes of 32868 without any degradation of performance.
Discrete Graph Hashing
- W. Liu, Cun Mu, Sanjiv Kumar, Shih-Fu Chang
- Computer ScienceNIPS
- 8 December 2014
Extensive experiments performed on four large datasets with up to one million samples show that the discrete optimization based graph hashing method obtains superior search accuracy over state-of-the-art un-supervised hashing methods, especially for longer codes.
Adaptive Methods for Nonconvex Optimization
- M. Zaheer, Sashank J. Reddi, Devendra Singh Sachan, Satyen Kale, Sanjiv Kumar
- Computer ScienceNeural Information Processing Systems
- 2018
The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.
Sequential Projection Learning for Hashing with Compact Codes
- Jun Wang, Sanjiv Kumar, Shih-Fu Chang
- Computer ScienceInternational Conference on Machine Learning
- 21 June 2010
This paper proposes a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially, and shows significant performance gains over the state-of-the-art methods on two large datasets containing up to 1 million points.
...
...