• Publications
  • Influence
On the Convergence of Adam and Beyond
TLDR
It is shown that one cause for such failures is the exponential moving average used in the algorithms, and suggested that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients.
Hashing with Graphs
TLDR
This paper proposes a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes and describes a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy.
Semi-Supervised Hashing for Large-Scale Search
TLDR
This work proposes a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets and presents three different semi- supervised hashing methods, including orthogonal hashing, nonorthogonal hash, and sequential hashing.
A New Baseline for Image Annotation
TLDR
This work introduces a new baseline technique for image annotation that treats annotation as a retrieval problem and outperforms the current state-of-the-art methods on two standard and one large Web dataset.
Face tracking and recognition with visual constraints in real-world videos
TLDR
This work addresses the problem of tracking and recognizing faces in real-world, noisy videos using a tracker that adaptively builds a target model reflecting changes in appearance, typical of a video setting and introduces visual constraints using a combination of generative and discriminative models in a particle filtering framework.
Discrete Graph Hashing
TLDR
Extensive experiments performed on four large datasets with up to one million samples show that the discrete optimization based graph hashing method obtains superior search accuracy over state-of-the-art un-supervised hashing methods, especially for longer codes.
Adaptive Federated Optimization
TLDR
This work proposes federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyzes their convergence in the presence of heterogeneous data for general nonconvex settings to highlight the interplay between client heterogeneity and communication efficiency.
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
TLDR
The empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning, and the optimizer enables use of very large batch sizes of 32868 without any degradation of performance.
Baselines for Image Annotation
TLDR
This work introduces a new and simple baseline technique for image annotation that treats annotation as a retrieval problem and outperforms the current state-of-the-art methods on two standard and one large Web dataset.
Sequential Projection Learning for Hashing with Compact Codes
TLDR
This paper proposes a novel data-dependent projection learning method such that each hash function is designed to correct the errors made by the previous one sequentially, and shows significant performance gains over the state-of-the-art methods on two large datasets containing up to 1 million points.
...
1
2
3
4
5
...