• Publications
  • Influence
Deep Voice: Real-time Neural Text-to-Speech
tl;dr
We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Expand
  • 284
  • 33
  • Open Access
Mixed Precision Training
tl;dr
We introduce a technique to train deep neural networks using half precision floating point numbers. Expand
  • 323
  • 30
  • Open Access
Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
tl;dr
This paper presents a high level overview of the implementation of the Ocelot dynamic compiler highlighting design decisions and trade-offs, and showcasing their effect on application performance. Expand
  • 225
  • 22
  • Open Access
Harmony: an execution model and runtime for heterogeneous many core systems
tl;dr
We propose Harmony, a runtime supported programming and execution model that provides: (1) semantics for simplifying parallelism management, (2) dynamic scheduling of compute intensive kernels to heterogeneous processor resources, and (3) online monitoring driven performance optimization for heterogeneous many core systems. Expand
  • 189
  • 17
  • Open Access
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
tl;dr
We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional speaker embeddings to generate different voices from a single model. Expand
  • 149
  • 10
  • Open Access
A characterization and analysis of PTX kernels
tl;dr
We report on an analysis of over 50 kernels and applications including the full NVIDIA CUDA SDK and UIUC's Parboil Benchmark Suite covering control flow, data flow, parallelism, and memory behavior. Expand
  • 128
  • 10
  • Open Access
Deep Learning Scaling is Predictable, Empirically
tl;dr
This paper presents the largest scale empirical characterization of learning curves to date that reveals broadly that DL generalization error does show power-law improvement, but with exponents that must be predicted empirically. Expand
  • 113
  • 7
  • Open Access
Simultaneous branch and warp interweaving for sustained GPU performance
tl;dr
We present two complementary techniques that mitigate the impact of thread divergence on SIMT micro-architectures. Expand
  • 86
  • 7
  • Open Access
Block-Sparse Recurrent Neural Networks
tl;dr
We investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization with pruning to create blocks with zeros. Expand
  • 46
  • 6
  • Open Access
Modeling GPU-CPU workloads and systems
tl;dr
Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. Expand
  • 98
  • 5
  • Open Access