• Publications
  • Influence
Enriching Word Vectors with Subword Information
TLDR
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks. Expand
Bag of Tricks for Efficient Text Classification
TLDR
A simple and efficient baseline for text classification is explored that shows that the fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. Expand
Unsupervised Cross-lingual Representation Learning at Scale
TLDR
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
Learning Word Vectors for 157 Languages
TLDR
This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors. Expand
Parseval Networks: Improving Robustness to Adversarial Examples
TLDR
It is shown that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers while being more robust than their vanilla counterpart against adversarial examples. Expand
FastText.zip: Compressing text classification models
TLDR
This work proposes a method built upon product quantization to store the word embeddings, which produces a text classifier, derived from the fastText approach, which at test time requires only a fraction of the memory compared to the original one, without noticeably sacrificing the quality in terms of classification accuracy. Expand
Advances in Pre-Training Distributed Word Representations
TLDR
This paper shows how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together to outperform the current state of the art by a large margin on a number of tasks. Expand
Colorless green recurrent networks dream hierarchically
TLDR
Support is brought to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence by making reliable predictions about long-distance agreement and do not lag much behind human performance. Expand
Improving Neural Language Models with a Continuous Cache
TLDR
A simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation, which is very efficient and scales to very large memory sizes. Expand
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
TLDR
This paper proposes an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion for word translation, and shows that this approach outperforms the state of the art on word translation. Expand
...
1
2
3
4
5
...