Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks

@article{Arik2019FastSI,
  title={Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks},
  author={Sercan {\"O}. Arik and Heewoo Jun and G. Diamos},
  journal={IEEE Signal Processing Letters},
  year={2019},
  volume={26},
  pages={94-98}
}
We propose the multi-head convolutional neural network (MCNN) for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN enables significantly better utilization of modern multi-core processors than commonly used iterative algorithms like Griffin–Lim, and yields very fast (more than 300 × real time) runtime. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms… Expand
42 Citations
Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks
  • 2
  • PDF
Improved Parallel Wavegan Vocoder with Perceptually Weighted Spectrogram Loss
  • 1
  • PDF
Waveglow: A Flow-based Generative Network for Speech Synthesis
  • 341
  • PDF
Robust universal neural vocoding
  • 17
  • PDF
Parallel Neural Text-to-Speech
  • 24
  • PDF
Efficient Neural Networks for Real-time Analog Audio Effect Modeling
  • 1
  • PDF
Universal Neural Vocoding with Parallel WaveNet
  • PDF
MelGlow: Efficient Waveform Generative Network Based On Location-Variable Convolution
  • 4
  • PDF
Unsupervised Cross-Domain Singing Voice Conversion
  • 6
  • PDF
Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram
  • Ryuichi Yamamoto, Eunwoo Song, J. Kim
  • Computer Science, Engineering
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
  • 120
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
  • 468
  • PDF
WaveNet: A Generative Model for Raw Audio
  • 3,489
  • PDF
Single Pass Spectrogram Inversion
  • 23
  • Highly Influential
Deep Voice 3: 2000-Speaker Neural Text-to-Speech
  • 149
  • PDF
Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
  • Jonathan Shen, Ruoming Pang, +10 authors Y. Wu
  • Computer Science
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
  • 847
  • PDF
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
  • 266
  • PDF
Deep Voice: Real-time Neural Text-to-Speech
  • 337
  • PDF
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition
  • 107
  • PDF
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
  • 272
  • PDF
...
1
2
3
...