• Corpus ID: 209445167

Preprint-work in progress

  title={Preprint-work in progress},
  author={Chenliang Xu},
Deep convolutional neural networks are known to specialize in distilling compact and robust prior from a large amount of data. We are interested in applying deep networks in the absence of training dataset. In this paper, we introduce deep audio prior (DAP) which leverages the structure of a network and the temporal information in a single audio file. Specifically, we demonstrate that a randomly-initialized neural network can be used with carefully designed audio prior to tackle challenging… 


Deep Image Prior
It is shown that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting.
Audio Denoising with Deep Network Priors
A method for audio denoising that combines processing done in both the time domain and the time-frequency domain, and only trains on the specific audio clip that is being denoised.
SEGAN: Speech Enhancement Generative Adversarial Network
This work proposes the use of generative adversarial networks for speech enhancement, and operates at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.
Deep clustering: Discriminative embeddings for segmentation and separation
Preliminary experiments on single-channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker mixtures can improve signal quality for mixtures of held-out speakers by an average of 6dB, and the same model does surprisingly well with three-speakers mixtures.
Monoaural Audio Source Separation Using Deep Convolutional Neural Networks
A low-latency monaural source separation framework using a Convolutional Neural Network and the performance of the neural network is evaluated on a database comprising of musical mixtures of three instruments as well as other instruments which vary from song to song.
CNN architectures for large-scale audio classification
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.
Unsupervised Learning of Semantic Audio Representations
  • A. Jansen, M. Plakal, R. Saurous
  • Computer Science
    2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
This work considers several class-agnostic semantic constraints that apply to unlabeled nonspeech audio and proposes low-dimensional embeddings of the input spectrograms that recover 41% and 84% of the performance of their fully-supervised counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively.
Deep attractor network for single-microphone speaker separation
  • Zhuo Chen, Yi Luo, N. Mesgarani
  • Computer Science
    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
A novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.
Deep NMF for speech separation
Deep NMF is proposed, a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters, which improves in terms of accuracy upon NMF and is competitive with conventional sigmoid deep neural networks, while requiring a tenth of the number of parameters.
Self-Supervised Intrinsic Image Decomposition
This paper proposes a model that joins together an image decomposition pipeline, which predicts reflectance, shape, and lighting conditions given a single image, with a recombination function, a learned shading model used to recompose the original input based off of intrinsic image predictions, and a network that can use unsupervised reconstruction error as an additional signal to improve its intermediate representations.