• Publications
  • Influence
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Distributed Representations of Sentences and Documents
TLDR
Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
TLDR
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.
Neural Architecture Search with Reinforcement Learning
TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
Exploiting Similarities among Languages for Machine Translation
TLDR
This method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data and uses distributed representation of words and learns a linear mapping between vector spaces of languages.
Efficient Neural Architecture Search via Parameter Sharing
TLDR
Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours.
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
TLDR
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Searching for MobileNetV3
TLDR
This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art of MobileNets.
Large Scale Distributed Deep Networks
TLDR
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
...
1
2
3
4
5
...