Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

@article{Salamon2017DeepCN,
  title={Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification},
  author={Justin Salamon and Juan Pablo Bello},
  journal={IEEE Signal Processing Letters},
  year={2017},
  volume={24},
  pages={279-283}
}
  • J. Salamon, J. Bello
  • Published 15 August 2016
  • Computer Science
  • IEEE Signal Processing Letters
The ability of deep convolutional neural networks (CNNs) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. [] Key Method Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a…

Figures from this paper

A Method of Environmental Sound Classification Based on Residual Networks and Data Augmentation
TLDR
A residual network called EnvResNet for the ESC task is proposed and it is proposed to use audio data augmentation to overcome the problem of data scarcity and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.
Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
TLDR
The proposed method outperforms the state-of-the-art end-to-end methods for environmental sound classification in terms of the classification accuracy and is Inspired by VGG networks.
A New Deep CNN Model for Environmental Sound Classification
TLDR
Deep features are used in the environmental sound classification (ESC) problem by using a newly developed Convolutional Neural Networks (CNN) model, which is trained in the end-to-end fashion with the spectrogram images.
Environmental Sound Classification with Parallel Temporal-Spectral Attention
TLDR
A novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations is proposed, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands.
Metric learning based data augmentation for environmental sound classification
TLDR
This paper proposes a framework for data augmentation through metric learning, which first learns a metric from the original training data, and then uses it to filter out augmented data samples that are far from original ones in the same class.
Dilated convolution neural network with LeakyReLU for environmental sound classification
TLDR
A dilated CNN-based ESC (D-CNN-ESC) system where dilated filters and LeakyReLU activation function are adopted that will increase receptive field of convolution layers to incorporate more contextual information and outperforms state-of-the-art ESC results obtained by very deep CNN- ESC system on UrbanSound8K dataset.
Leveraging deep neural networks with nonnegative representations for improved environmental sound classification
TLDR
The use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification and the proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network
TLDR
This is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets, and the accuracy achieved by the proposed model is beyond human accuracy.
Deep convolutional neural network for environmental sound classification via dilation
TLDR
The gradual increaments of dilation rate has exploited the worse effect of grindding and has lowered down the computational cost, and overall classification performance, precision, recall, overall truth and kappa value have been obtained from the proposed dilated convolutional method.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
Environmental sound classification with convolutional neural networks
  • Karol J. Piczak
  • Computer Science
    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2015
TLDR
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Unsupervised feature learning for urban sound classification
  • J. Salamon, J. Bello
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
It is shown that feature learning can outperform the baseline approach by configuring it to capture the temporal dynamics of urban sources, and is evaluated on the largest public dataset of urban sound sources available for research, and compared to a baseline system based on MFCCs.
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
Acoustic scene classification with matrix factorization for unsupervised feature learning
TLDR
The results show the compared variants lead to significant improvement compared to the state-of-the-art results in ASC.
Feature learning with deep scattering for urban sound analysis
  • J. Salamon, J. Bello
  • Computer Science
    2015 23rd European Signal Processing Conference (EUSIPCO)
  • 2015
TLDR
It is shown that the scattering transform can be used as an alternative signal representation to the mel-spectrogram whilst reducing both the amount of training data required for feature learning and the size of the learned codebook by an order of magnitude.
Polyphonic sound event detection using multi label deep neural networks
TLDR
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Dropout: a simple way to prevent neural networks from overfitting
TLDR
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
ESC: Dataset for Environmental Sound Classification
TLDR
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
Best practices for convolutional neural networks applied to visual document analysis
TLDR
A set of concrete bestpractices that document analysis researchers can use to get good results with neural networks, including a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.
...
...