• Publications
  • Influence
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems.Expand
  • 2,861
  • 232
  • PDF
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that mapsExpand
  • 612
  • 150
  • PDF
Tacotron: Towards End-to-End Speech Synthesis
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requiresExpand
  • 521
  • 119
  • PDF
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no changes to the model architecture from a standardExpand
  • 829
  • 103
  • PDF
Exploring the Limits of Language Modeling
In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two keyExpand
  • 740
  • 88
  • PDF
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognitionExpand
  • 518
  • 29
  • PDF
Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requiresExpand
  • 132
  • 25
  • PDF
Reward Augmented Maximum Likelihood for Neural Structured Prediction
A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approachExpand
  • 129
  • 24
  • PDF
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. OurExpand
  • 172
  • 21
  • PDF
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been usedExpand
  • 73
  • 18
  • PDF