• Publications
  • Influence
Deep Voice: Real-time Neural Text-to-Speech
TLDR
Deep Voice lays the groundwork for truly end-to-end neural speech synthesis and shows that inference with the system can be performed faster than real time and describes optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
TLDR
Deep Voice 3 is presented, a fully-convolutional attention-based neural text-to-speech (TTS) system that matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster.
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
TLDR
It is shown that a single neural TTS system can learn hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality synthesis and preserving the speaker identities almost perfectly.
TabNet: Attentive Interpretable Tabular Learning
TLDR
It is demonstrated that TabNet outperforms other neural network and decision tree variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into the global model behavior.
Neural Voice Cloning with a Few Samples
TLDR
While speaker adaptation can achieve better naturalness and similarity, the cloning time or required memory for the speaker encoding approach is significantly less, making it favorable for low-resource deployment.
Deep Voice 3: 2000-Speaker Neural Text-to-Speech
TLDR
Deep Voice 3 is presented, a fully-convolutional attention-based neural text-to-speech (TTS) system that matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster.
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
TLDR
Systems and methods for creating and using Convolutional Recurrent Neural Networks for small-footprint keyword spotting (KWS) systems and a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments are described.
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
TLDR
The Temporal Fusion Transformer is introduced -- a novel attention-based architecture which combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics and three practical interpretability use-cases of TFT are showcased.
Distilling Effective Supervision From Severe Label Noise
TLDR
This paper presents a holistic framework to train deep neural networks in a way that is highly invulnerable to label noise and achieves excellent performance on large-scale datasets with real-world label noise.
Effect of Mode Coupling on Signal Processing Complexity in Mode-Division Multiplexing
Mode-division multiplexing systems employ multi-input multi-output (MIMO) equalization to compensate for chromatic dispersion (CD), modal dispersion (MD) and modal crosstalk. The computational
...
1
2
3
4
5
...