• Publications
  • Influence
Sequence to Sequence Learning with Neural Networks
TLDR
In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Expand
  • 12,141
  • 1092
  • PDF
Distributed Representations of Sentences and Documents
TLDR
We propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Expand
  • 5,939
  • 858
  • PDF
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
We propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. Expand
  • 1,961
  • 457
  • PDF
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TLDR
We propose XLNet, a generalized autoregressive pretraining method that leverages the best of both AR language modeling and AE while avoiding their limitations. Expand
  • 2,351
  • 413
  • PDF
Neural Architecture Search with Reinforcement Learning
TLDR
We use a recurrent network to generate the model descriptions of neural networks and train it with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. Expand
  • 2,480
  • 346
  • PDF
Exploiting Similarities among Languages for Machine Translation
TLDR
This paper develops a method that can automate the process of generating and extending dictionaries and translation tables for any language pairs. Expand
  • 1,153
  • 344
  • PDF
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
TLDR
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Expand
  • 3,508
  • 274
  • PDF
Efficient Neural Architecture Search via Parameter Sharing
TLDR
We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. Expand
  • 1,177
  • 273
  • PDF
Large Scale Distributed Deep Networks
TLDR
We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas. Expand
  • 2,618
  • 270
  • PDF
Regularized Evolution for Image Classifier Architecture Search
TLDR
The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Expand
  • 1,168
  • 212
  • PDF
...
1
2
3
4
5
...