• Publications
  • Influence
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models. Expand
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that mapsExpand
Tacotron: Towards End-to-End Speech Synthesis
Tacotron is presented, an end-to-end generative text- to-speech model that synthesizes speech directly from characters that achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. Expand
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language. Expand
Exploring the Limits of Language Modeling
This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. Expand
Conformer: Convolution-augmented Transformer for Speech Recognition
This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. Expand
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
It is shown that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation. Expand
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. Expand
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
A variety of structural and optimization improvements to the Listen, Attend, and Spell model are explored, which significantly improve performance and a multi-head attention architecture is introduced, which offers improvements over the commonly-used single- head attention. Expand
Reward Augmented Maximum Likelihood for Neural Structured Prediction
This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework, and shows that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Expand