• Publications
  • Influence
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
TLDR
We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and selfattention models global interactions. Expand
  • 468
  • 92
  • PDF
Multi-task Sequence to Sequence Learning
TLDR
This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, which is the case with unsupervised objectives and translation. Expand
  • 562
  • 64
  • PDF
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
TLDR
We propose a new pre-training task called replaced token detection. Expand
  • 177
  • 50
  • PDF
A Hierarchical Neural Autoencoder for Paragraphs and Documents
TLDR
We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. Expand
  • 466
  • 40
  • PDF
Self-Training With Noisy Student Improves ImageNet Classification
TLDR
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the previous state-of-the-art model that requires 3.5B weakly labeled Instagram images. Expand
  • 196
  • 36
  • PDF
Unsupervised Data Augmentation for Consistency Training
TLDR
We propose a method to apply data augmentation to unlabeled data in a semi-supervised learning setting that outperforms the state-of-the-art. Expand
  • 217
  • 27
  • PDF
Massive Exploration of Neural Machine Translation Architectures
TLDR
We present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. Expand
  • 316
  • 24
  • PDF
Unsupervised Data Augmentation
TLDR
We propose to apply state-of-the-art data augmentation methods found in supervised learning as the perturbation function in a semi-supervised learning setting. Expand
  • 127
  • 23
Semi-Supervised Sequence Modeling with Cross-View Training
TLDR
We propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. Expand
  • 142
  • 19
  • PDF
Towards a Human-like Open-Domain Chatbot
TLDR
We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. Expand
  • 102
  • 18
  • PDF