• Publications
  • Influence
Deep Metric Learning Using Triplet Network
TLDR
This paper proposes the triplet network model, which aims to learn useful representations by distance comparisons, and demonstrates using various datasets that this model learns a better representation than that of its immediate competitor, the Siamese network.
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
TLDR
This work proposes a "random walk on random landscape" statistical model which is known to exhibit similar "ultra-slow" diffusion behavior and presents a novel algorithm named "Ghost Batch Normalization" which enables significant decrease in the generalization gap without increasing the number of updates.
Scalable Methods for 8-bit Training of Neural Networks
TLDR
This work is the first to quantize the weights, activations, as well as a substantial volume of the gradients stream, in all layers (including batch normalization) to 8-bit while showing state-of-the-art results over the ImageNet-1K dataset.
Norm matters: efficient and accurate normalization schemes in deep networks
TLDR
A novel view is presented on the purpose and function of normalization methods and weight-decay, as tools to decouple weights' norm from the underlying optimized objective, and a modification to weight-normalization, which improves its performance on large-scale tasks.
Task-Agnostic Continual Learning Using Online Variational Bayes With Fixed-Point Updates
TLDR
This work derives novel fixed-point equations for the online variational Bayes optimization problem for multivariate gaussian parametric distributions and obtains an algorithm (FOO-VB) that can handle nonstationary data distribution using a fixed architecture and without using external memory.
ACIQ: Analytical Clipping for Integer Quantization of neural networks
TLDR
This work identifies the statistics of various tensors, and derives exact expressions for the mean-square-error degradation due to clipping in low precision networks, and shows marked improvements over standard quantization schemes that normally avoid clipping.
Fix your classifier: the marginal value of training the last weight layer
TLDR
This work argues that this classifier can be fixed, up to a global scale constant, with little or no loss of accuracy for most tasks, allowing memory and computational benefits.
Augment Your Batch: Improving Generalization Through Instance Repetition
TLDR
The results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-the-art, and enables faster training and better generalization by allowing more computational resources to be used concurrently.
The Knowledge Within: Methods for Data-Free Model Compression
TLDR
This work presents three methods for generating synthetic samples from trained models and demonstrates how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process, alleviating the need for training data during model deployment.
...
1
2
3
...