• Publications
  • Influence
Enriching Word Vectors with Subword Information
We propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams, and represent words as the sum of then-gram vectors. Expand
  • 3,516
  • 577
Bag of Tricks for Efficient Text Classification
This paper explores a simple and efficient baseline for text classification. Expand
  • 1,817
  • 219
Deep Clustering for Unsupervised Learning of Visual Features
We present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features and uses the subsequent assignments as supervision to update the weights of the network. Expand
  • 407
  • 65
Learning Word Vectors for 157 Languages
We train high quality word vectors trained on Wikipedia and the Common Crawl, as well as three new word analogy datasets to evaluate these word vectors. Expand
  • 409
  • 56
Parseval Networks: Improving Robustness to Adversarial Examples
We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Expand
  • 354
  • 51
Advances in Pre-Training Distributed Word Representations
We show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. Expand
  • 429
  • 41
FastText.zip: Compressing text classification models
We propose a text classifier, derived from the fastText approach, which at test time requires only a fraction of the memory compared to the original one, without noticeably sacrificing accuracy. Expand
  • 299
  • 36
Colorless green recurrent networks dream hierarchically
We test whether RNNs trained with a generic language modeling objective in four languages can predict long-distance number agreement in various constructions. Expand
  • 165
  • 33
Weakly Supervised Action Labeling in Videos under Ordering Constraints
We propose a weakly supervised temporal assignment with ordering constraints to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. Expand
  • 142
  • 29
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Expand
  • 92
  • 25