Language Modeling with Gated Convolutional Networks
- Y. Dauphin, Angela Fan, Michael Auli, David Grangier
- Computer ScienceInternational Conference on Machine Learning
- 23 December 2016
A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
- Myle Ott, Sergey Edunov, Michael Auli
- Computer ScienceNorth American Chapter of the Association for…
- 1 April 2019
Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Hierarchical Neural Story Generation
- Angela Fan, M. Lewis, Y. Dauphin
- Computer ScienceAnnual Meeting of the Association for…
- 1 May 2018
This work collects a large dataset of 300K human-written stories paired with writing prompts from an online forum that enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text.
Wizard of Wikipedia: Knowledge-Powered Conversational agents
- Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, J. Weston
- Computer ScienceInternational Conference on Learning…
- 27 September 2018
The best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while a new benchmark allows for measuring further improvements in this important research direction.
Pay Less Attention with Lightweight and Dynamic Convolutions
- Felix Wu, Angela Fan, Alexei Baevski, Y. Dauphin, Michael Auli
- Computer ScienceInternational Conference on Learning…
- 29 January 2019
It is shown that a very lightweight convolution can perform competitively to the best reported self-attention results, and dynamic convolutions are introduced which are simpler and more efficient than self-ATTention.
Reducing Transformer Depth on Demand with Structured Dropout
- Angela Fan, Edouard Grave, Armand Joulin
- Computer ScienceInternational Conference on Learning…
- 25 September 2019
LayerDrop, a form of structured dropout, is explored, which has a regularization effect during training and allows for efficient pruning at inference time, and shows that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance.
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
- Y. Tang, C. Tran, Angela Fan
- Computer ScienceArXiv
- 2 August 2020
This work shows that multilingual translation models can be created through multilingual finetuning, and demonstrates that pretrained models can been extended to incorporate additional languages without loss of performance.
ELI5: Long Form Question Answering
- Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, J. Weston, Michael Auli
- Computer ScienceAnnual Meeting of the Association for…
- 1 July 2019
This work introduces the first large-scale corpus for long form question answering, a task requiring elaborate and in-depth answers to open-ended questions, and shows that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline.
Controllable Abstractive Summarization
- Angela Fan, David Grangier, Michael Auli
- Computer ScienceNMT@ACL
- 14 November 2017
A neural summarization model with a simple but effective mechanism to enable users to specify high level attributes in order to control the shape of the final summaries to better suit their needs.
Nearest Neighbor Machine Translation
- Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, M. Lewis
- Computer ScienceInternational Conference on Learning…
- 1 October 2020
We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a…
...
...