Share This Author
Enriching Word Vectors with Subword Information
A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Bag of Tricks for Efficient Text Classification
A simple and efficient baseline for text classification is explored that shows that the fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation.
Learning Word Vectors for 157 Languages
- Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov
- Computer ScienceLREC
- 19 February 2018
This paper describes how high quality word representations for 157 languages were trained on the free online encyclopedia Wikipedia and data from the common crawl project, and introduces three new word analogy datasets to evaluate these word vectors.
FastText.zip: Compressing text classification models
- Armand Joulin, Edouard Grave, Piotr Bojanowski, M. Douze, H. Jégou, Tomas Mikolov
- Computer ScienceArXiv
- 12 December 2016
This work proposes a method built upon product quantization to store the word embeddings, which produces a text classifier, derived from the fastText approach, which at test time requires only a fraction of the memory compared to the original one, without noticeably sacrificing the quality in terms of classification accuracy.
Advances in Pre-Training Distributed Word Representations
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin
- Computer ScienceLREC
- 26 December 2017
This paper shows how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together to outperform the current state of the art by a large margin on a number of tasks.
Parseval Networks: Improving Robustness to Adversarial Examples
- Moustapha Cissé, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier
- Computer ScienceICML
- 28 April 2017
It is shown that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers while being more robust than their vanilla counterpart against adversarial examples.
Colorless Green Recurrent Networks Dream Hierarchically
- Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni
- Computer ScienceNAACL
- 29 March 2018
The authors' language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance, bringing support to the hypothesis that RNN's are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Interestingly, it is observed that the performance of this method significantly improves when increasing the number of retrieved passages, evidence that sequence-to-sequence models offers a flexible framework to efficiently aggregate and combine evidence from multiple passages.
Reducing Transformer Depth on Demand with Structured Dropout
LayerDrop, a form of structured dropout, is explored, which has a regularization effect during training and allows for efficient pruning at inference time, and shows that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance.