• Corpus ID: 220969060

Learning to Denoise Historical Music

@inproceedings{Li2020LearningTD,
  title={Learning to Denoise Historical Music},
  author={Yunpeng Li and Beat Gfeller and Marco Tagliasacchi and Dominik Roblek},
  booktitle={ISMIR},
  year={2020}
}
We propose an audio-to-audio neural network model that learns to denoise old music recordings. Our model internally converts its input into a time-frequency representation by means of a short-time Fourier transform (STFT), and processes the resulting complex spectrogram using a convolutional neural network. The network is trained with both reconstruction and adversarial objectives on a synthetic noisy music dataset, which is created by mixing clean music with real noise samples extracted from… 

Figures and Tables from this paper

A Two-stage U-Net for high-fidelity denoising of historical recordings
TLDR
A novel denoising method based on a fully-convolutional deep neural network that is trained using realistic noisy data to jointly remove hiss, clicks, thumps, and other common additive disturbances from old analog discs.
Automatic Quality Assessment of Digitized and Restored Sound Archives
TLDR
A framework to assess the quality of experience (QoE) of sound archives in an automatic fashion is presented and the reasons why stake- holders, such as archivists, broadcasters, or public listeners, would benefit from the proposed framework are provided.
BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks
TLDR
The results of a formal blind listening test show that BEHM- GAN increases the perceptual sound quality in early-20th-century gramophone recordings and represents a relevant step toward data-driven music restoration in real-world scenarios.
CycleGAN-Based Unpaired Speech Dereverberation
TLDR
A CycleGAN-based approach that enables dereverberation models to be trained on unpaired data is proposed and it is shown that the performance of the unpaired model is comparable to theperformance of the paired model on two different datasets, according to objective evaluation metrics.
Convergent evolution in a large cross-cultural database of musical scales
Scales, sets of discrete pitches used to generate melodies, are thought to be one of the most universal features of music. Despite this, we know relatively little about how cross-cultural diversity,
ASSESSMENTS: A REVIEW
Image restoration is the process of restoring the original image from a degraded one. Images can be affected by various types of noise, such as Gaussian noise, impulse noise, and affected by
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
TLDR
It is illustrated that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal, using a GAN-based generative model that can be trained on one short audio signal from any domain and does not require pre-training or any other form of external supervision.
On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks
TLDR
A data augmentation strategy is proposed which utilizes multiple low-pass filters during training and leads to improved generalization to unseen filtering conditions at test time, which results in a lower SNR than the band-limited input.
One-Shot Conditional Audio Filtering of Arbitrary Sounds
We consider the problem of separating a particular sound source from a single-channel mixture, based on only a short sample of the target source (from the same recording). Using SoundFilter, a
Micaugment: One-Shot Microphone Style Transfer
TLDR
It is shown that the proposed method to perform one-shot microphone style transfer can successfully apply the style transfer to real audio and that it significantly increases model robustness when used as data augmentation in the downstream tasks.

References

SHOWING 1-10 OF 41 REFERENCES
AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks
TLDR
A generative adversarial network (GAN) based framework is investigated for the task of speech enhancement, more specifically speech denoising of audio tracks and is shown to outperform other learning and traditional model-based speech enhancement approaches.
SciPy 1.0: fundamental algorithms for scientific computing in Python
TLDR
An overview of the capabilities and development practices of SciPy 1.0 is provided and some recent technical developments are highlighted.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Audio Codec Enhancement with Generative Adversarial Networks
  • A. Biswas, Dai Jia
  • Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
TLDR
A GAN-based coded audio enhancer that operates end-to-end directly on decoded audio samples, eliminating the need to design any manually-crafted frontend and improving the quality of speech and difficult to code applause excerpts significantly.
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
TLDR
The model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion, and suggests a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks.
Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms
TLDR
The Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms, is proposed and it is shown that, with a correlation coefficient of 0.52, FAD correlates more closely with human perception than either SDR, cosine distance or magnitude L2 distance.
Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations
TLDR
Temporal Feature-Wise Linear Modulation (TFiLM) is proposed, a novel architectural component inspired by adaptive batch normalization and its extensions that uses a recurrent neural network to alter the activations of a convolutional model.
Contributors
  • Journal of Law, Medicine & Ethics
  • 2003
Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration
TLDR
The proposed method substantially attenuates audible artifacts caused by codecs and is conceptually straightforward, and can efficiently generate improved-quality audio that is competitive or even superior in perceptual quality to the audio produced by other state-of-the-art deep neural network methods and the LAME-MP3 codec.
Audio Denoising with Deep Network Priors
TLDR
A method for audio denoising that combines processing done in both the time domain and the time-frequency domain, and only trains on the specific audio clip that is being denoised.
...
...