• Publications
  • Influence
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
TLDR
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
Deep Voice: Real-time Neural Text-to-Speech
TLDR
Deep Voice lays the groundwork for truly end-to-end neural speech synthesis and shows that inference with the system can be performed faster than real time and describes optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.
Relational recurrent neural networks
TLDR
A new memory module -- a \textit{Relational Memory Core} (RMC) -- is used which employs multi-head dot product attention to allow memories to interact and achieves state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets.
Persistent RNNs: Stashing Recurrent Weights On-Chip
TLDR
This paper introduces a new technique for mapping Deep Recurrent Neural Networks efficiently onto GPUs that uses persistent computational kernels that exploit the GPU's inverted memory hierarchy to reuse network weights over multiple timesteps.
Learning and Evaluating General Linguistic Intelligence
TLDR
This work analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against general linguistic intelligence criteria, and proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.
Towards Interpretable Reinforcement Learning Using Attention Augmented Agents
TLDR
This model uses a soft, top-down attention mechanism to create a bottleneck in the agent, forcing it to focus on task-relevant information by sequentially querying its view of the environment.
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
TLDR
Non-Attentive Tacotron is presented, replacing the attention mechanism with an explicit duration predictor, which improves robustness significantly as measured by unaligned duration ratio and word deletion rate, two metrics introduced in this paper for large-scale robustness evaluation using a pre-trained speech recognition model.
Towards Robust Image Classification Using Sequential Attention Models
TLDR
This paper adversarially train and analyze a neural model incorporating a human inspired, visual attention component that is guided by a recurrent top-down sequential process that significantly improves adversarial robustness and results in state-of-the-art ImageNet accuracies under a wide range of random targeted attack strengths.
...
1
2
...