• Corpus ID: 233241038

EnvGAN: Adversarial Synthesis of Environmental Sounds for Data Augmentation

@article{Madhu2021EnvGANAS,
  title={EnvGAN: Adversarial Synthesis of Environmental Sounds for Data Augmentation},
  author={Aswathy Madhu and Suresh Kirthi Kumaraswamy},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.07326}
}
The research in Environmental Sound Classification (ESC) has been progressively growing with the emergence of deep learning algorithms. However, data scarcity poses a major hurdle for any huge advance in this domain. Data augmentation offers an excellent solution to this problem. While Generative Adversarial Networks (GANs) have been successful in generating synthetic speech and sounds of musical instruments, they have hardly been applied to the generation of environmental sounds. This paper… 
1 Citations

Figures and Tables from this paper

Dreamsound: Deep Activation Layer Sonification

This paper presents DreamSound, a creative adaptation of Deep Dream to sound addressed from two approaches: input manipulation, and sonification design, and the chosen model is YAMNet, a pre-trained deep network for sound classification.

References

SHOWING 1-10 OF 52 REFERENCES

Data Augmentation Using Generative Adversarial Network for Environmental Sound Classification

A deep learning framework employing convolutional neural network for automatic environmental sound classification and a novel technique for audio data augmentation using a generative adversarial network (GAN).

Adversarial Audio Synthesis

WaveGAN is a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio, capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation.

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.

Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

A novel deep convolutional neural network is proposed to be used for environmental sound classification (ESC) tasks that uses stacked Convolutional and pooling layers to extract high-level feature representations from spectrogram-like features.

Dilated convolution neural network with LeakyReLU for environmental sound classification

A dilated CNN-based ESC (D-CNN-ESC) system where dilated filters and LeakyReLU activation function are adopted that will increase receptive field of convolution layers to incorporate more contextual information and outperforms state-of-the-art ESC results obtained by very deep CNN- ESC system on UrbanSound8K dataset.

Data Augmentation for Deep Neural Network Acoustic Modeling

Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM) for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity are investigated.

Learning Attentive Representations for Environmental Sound Classification

The role of convolution filters in detecting energy modulation patterns and propose a channel attention mechanism to focus on the semantically relevant channels generated by corresponding filters to achieve the state-of-the-art or competitive results in terms of classification accuracy.

Improved Training of Wasserstein GANs

This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.

Audio augmentation for speech recognition

This paper investigates audio-level speech augmentation methods which directly process the raw signal, and presents results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios.
...