On the Design of Deep Priors for Unsupervised Audio Restoration

  title={On the Design of Deep Priors for Unsupervised Audio Restoration},
  author={Vivek Sivaraman Narayanaswamy and Jayaraman J. Thiagarajan and Andreas Spanias},
Unsupervised deep learning methods for solving audio restoration problems extensively rely on carefully tailored neural architectures that carry strong inductive biases for defining priors in the time or spectral domain. In this context, lot of recent success has been achieved with sophisticated convolutional network constructions that recover audio signals in the spectral domain. However, in practice, audio priors require careful engineering of the convolutional kernels to be effective at… 
1 Citations

Figures and Tables from this paper

Deep Audio Waveform Prior

This work shows that existing State-Of-The-Art (SOTA) architectures for audio source separation contain deep priors even when working with the raw waveform.



Unsupervised Audio Source Separation using Generative Priors

This work proposes a novel approach for audio source separation based on generative priors trained on individual sources that simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources through the use of projected gradient descent optimization.

Deep Image Prior

It is shown that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting.

Deep Audio Priors Emerge From Harmonic Convolutional Networks

Harmonic Convolution is proposed, an operation that helps deep networks distill priors in audio signals by explicitly utilizing the harmonic structure within by engineering the kernel to be supported by sets of harmonic series, instead of local neighborhoods for convolutional kernels.

SEGAN: Speech Enhancement Generative Adversarial Network

This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

Deep Long Audio Inpainting

This work takes a pioneering step, exploring the possibility of adapting deep learning frameworks from various domains inclusive of audio synthesis and image inpainting for audio inPainting and exploring how factors ranging from mask size, receptive field and audio representation could affect the performance.

Adversarial Audio Synthesis

WaveGAN is a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio, capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation.

Solving Linear Inverse Problems Using Gan Priors: An Algorithm with Provable Guarantees

  • Viraj ShahC. Hegde
  • Computer Science, Mathematics
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
This work proposes a projected gradient descent (PGD) algorithm for effective use of GAN priors for linear inverse problems, and provides theoretical guarantees on the rate of convergence of this algorithm.

Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

A simple convolutional and recurrent model is introduced that outperforms the state-of-the-art model on waveforms, that is, Wave-U-Net, by 1.6 points of SDR (signal to distortion ratio) and a new scheme to leverage unlabeled music is proposed.

WaveNet: A Generative Model for Raw Audio

WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

The Wave-U-Net is proposed, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales and indicates that its architecture yields a performance comparable to a state-of-the-art spectrogram-based U- net architecture, given the same data.