RoBERTa: A Robustly Optimized BERT Pretraining Approach
- Yinhan Liu, Myle Ott, Veselin Stoyanov
- Computer ScienceArXiv
- 26 July 2019
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Unsupervised Cross-lingual Representation Learning at Scale
- Alexis Conneau, Kartikay Khandelwal, Veselin Stoyanov
- Computer ScienceAnnual Meeting of the Association for…
- 5 November 2019
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
- Myle Ott, Yejin Choi, Claire Cardie, J. Hancock
- Computer ScienceAnnual Meeting of the Association for…
- 19 June 2011
This work develops and compares three approaches to detecting deceptive opinion spam, and develops a classifier that is nearly 90% accurate on the authors' gold-standard opinion spam dataset, and reveals a relationship between deceptive opinions and imaginative writing.
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
- Myle Ott, Sergey Edunov, Michael Auli
- Computer ScienceNorth American Chapter of the Association for…
- 1 April 2019
Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Understanding Back-Translation at Scale
- Sergey Edunov, Myle Ott, Michael Auli, David Grangier
- Computer ScienceConference on Empirical Methods in Natural…
- 1 August 2018
This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences, finding that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective.
Phrase-Based & Neural Unsupervised Machine Translation
- Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, M. Ranzato
- Computer ScienceConference on Empirical Methods in Natural…
- 20 April 2018
This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters.
Recipes for Building an Open-Domain Chatbot
- Stephen Roller, Emily Dinan, J. Weston
- Computer ScienceConference of the European Chapter of the…
- 28 April 2020
Human evaluations show the best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements, and the limitations of this work are discussed by analyzing failure cases of the models.
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
- Alexander Rives, Siddharth Goyal, R. Fergus
- Computer Science, BiologyProceedings of the National Academy of Sciences
- 29 April 2019
This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, and finds that without prior knowledge, information emerges in the learned representations on fundamental properties of proteins such as secondary structure, contacts, and biological activity.
Scaling Neural Machine Translation
- Myle Ott, Sergey Edunov, David Grangier, Michael Auli
- Computer ScienceConference on Machine Translation
- 1 June 2018
This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation.
Negative Deceptive Opinion Spam
- Myle Ott, Claire Cardie, J. Hancock
- Computer ScienceNorth American Chapter of the Association for…
- 1 June 2013
This work creates and study the first dataset of deceptive opinion spam with negative sentiment reviews, and finds that standard n-gram text categorization techniques can detect negative deceptive opinions spam with performance far surpassing that of human judges.
...
...