• Publications
  • Influence
Word Translation Without Parallel Data
TLDR
It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Expand
Training data-efficient image transformers & distillation through attention
TLDR
This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention. Expand
Emerging Properties in Self-Supervised Vision Transformers
TLDR
This paper questions if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets), and implements DINO, a form of self-distillation with no labels, which is implemented into a simple self- supervised method. Expand
Going deeper with Image Transformers
TLDR
This work builds and optimize deeper transformer networks for image classification and investigates the interplay of architecture and optimization of such dedicated transformers, making two architecture changes that significantly improve the accuracy of deep transformers. Expand
ResMLP: Feedforward networks for image classification with data-efficient training
TLDR
ResMLP is a simple residual network that alternates a linear layer in which image patches interact, independently and identically across channels, and a two-layer feed-forward network in which channels interact independently per patch. Expand
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
TLDR
This work designs a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime, and proposes LeVIT: a hybrid neural network for fast inference image classification. Expand
Fixing the train-test resolution discrepancy: FixEfficientNet
TLDR
This strategy is advantageously combined with recent training recipes from the literature and significantly outperforms the initial architecture with the same number of parameters, and establishes the new state of the art for ImageNet with a single crop. Expand
ResNet strikes back: An improved training procedure in timm
TLDR
This paper re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances, and shares competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. Expand
Training Vision Transformers for Image Retrieval
TLDR
This work adopts vision transformers for generating image descriptors and train the resulting model with a metric learning objective, which combines a contrastive loss with a differential entropy regularizer, and shows consistent and significant improvements of transformers over convolutionbased approaches. Expand
Radioactive data: tracing through training
TLDR
A new technique is proposed that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark, robust to data augmentation and the stochasticity of deep network optimization. Expand
...
1
2
...