• Publications
  • Influence
Enriching Word Vectors with Subword Information
tl;dr
We propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams, and represent words as the sum of then-gram vectors. Expand
  • 3,516
  • 577
Bag of Tricks for Efficient Text Classification
tl;dr
This paper explores a simple and efficient baseline for text classification. Expand
  • 1,817
  • 219
Deep Clustering for Unsupervised Learning of Visual Features
tl;dr
We present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features and uses the subsequent assignments as supervision to update the weights of the network. Expand
  • 407
  • 65
Learning Word Vectors for 157 Languages
tl;dr
We train high quality word vectors trained on Wikipedia and the Common Crawl, as well as three new word analogy datasets to evaluate these word vectors. Expand
  • 409
  • 56
Parseval Networks: Improving Robustness to Adversarial Examples
tl;dr
We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Expand
  • 354
  • 51
Advances in Pre-Training Distributed Word Representations
tl;dr
We show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. Expand
  • 429
  • 41
FastText.zip: Compressing text classification models
tl;dr
We propose a text classifier, derived from the fastText approach, which at test time requires only a fraction of the memory compared to the original one, without noticeably sacrificing accuracy. Expand
  • 299
  • 36
Colorless green recurrent networks dream hierarchically
tl;dr
We test whether RNNs trained with a generic language modeling objective in four languages can predict long-distance number agreement in various constructions. Expand
  • 165
  • 33
Weakly Supervised Action Labeling in Videos under Ordering Constraints
tl;dr
We propose a weakly supervised temporal assignment with ordering constraints to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. Expand
  • 142
  • 29
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
tl;dr
In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Expand
  • 92
  • 25