Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
@article{Salamon2017DeepCN, title={Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification}, author={Justin Salamon and Juan Pablo Bello}, journal={IEEE Signal Processing Letters}, year={2017}, volume={24}, pages={279-283} }
The ability of deep convolutional neural networks (CNNs) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. [] Key Method Combined with data augmentation, the proposed model produces state-of-the-art results for environmental sound classification. We show that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a…
922 Citations
A Method of Environmental Sound Classification Based on Residual Networks and Data Augmentation
- Computer ScienceInt. J. Comput. Intell. Appl.
- 2021
A residual network called EnvResNet for the ESC task is proposed and it is proposed to use audio data augmentation to overcome the problem of data scarcity and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.
Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
- Computer ScienceMMM
- 2019
The proposed method outperforms the state-of-the-art end-to-end methods for environmental sound classification in terms of the classification accuracy and is Inspired by VGG networks.
Spectral images based environmental sound classification using CNN with meaningful data augmentation
- Computer Science
- 2021
A New Deep CNN Model for Environmental Sound Classification
- Computer ScienceIEEE Access
- 2020
Deep features are used in the environmental sound classification (ESC) problem by using a newly developed Convolutional Neural Networks (CNN) model, which is trained in the end-to-end fashion with the spectrogram images.
Environmental Sound Classification with Parallel Temporal-Spectral Attention
- Computer Science, Environmental ScienceINTERSPEECH
- 2020
A novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations is proposed, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands.
Metric learning based data augmentation for environmental sound classification
- Computer Science2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2017
This paper proposes a framework for data augmentation through metric learning, which first learns a metric from the original training data, and then uses it to filter out augmented data samples that are far from original ones in the same class.
Dilated convolution neural network with LeakyReLU for environmental sound classification
- Computer Science2017 22nd International Conference on Digital Signal Processing (DSP)
- 2017
A dilated CNN-based ESC (D-CNN-ESC) system where dilated filters and LeakyReLU activation function are adopted that will increase receptive field of convolution layers to incorporate more contextual information and outperforms state-of-the-art ESC results obtained by very deep CNN- ESC system on UrbanSound8K dataset.
Leveraging deep neural networks with nonnegative representations for improved environmental sound classification
- Computer Science2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2017
The use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification and the proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network
- Computer ScienceINTERSPEECH
- 2020
This is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets, and the accuracy achieved by the proposed model is beyond human accuracy.
Deep convolutional neural network for environmental sound classification via dilation
- Computer ScienceJournal of Intelligent & Fuzzy Systems
- 2022
The gradual increaments of dilation rate has exploited the worse effect of grindding and has lowered down the computational cost, and overall classification performance, precision, recall, overall truth and kappa value have been obtained from the proposed dilated convolutional method.
References
SHOWING 1-10 OF 39 REFERENCES
Environmental sound classification with convolutional neural networks
- Computer Science2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
ImageNet classification with deep convolutional neural networks
- Computer ScienceCommun. ACM
- 2012
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Unsupervised feature learning for urban sound classification
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
It is shown that feature learning can outperform the baseline approach by configuring it to capture the temporal dynamics of urban sources, and is evaluated on the largest public dataset of urban sound sources available for research, and compared to a baseline system based on MFCCs.
Recurrent neural networks for polyphonic sound event detection in real life recordings
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single…
Acoustic scene classification with matrix factorization for unsupervised feature learning
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
The results show the compared variants lead to significant improvement compared to the state-of-the-art results in ASC.
Feature learning with deep scattering for urban sound analysis
- Computer Science2015 23rd European Signal Processing Conference (EUSIPCO)
- 2015
It is shown that the scattering transform can be used as an alternative signal representation to the mel-spectrogram whilst reducing both the amount of training data required for feature learning and the size of the learned codebook by an order of magnitude.
Polyphonic sound event detection using multi label deep neural networks
- Computer Science2015 International Joint Conference on Neural Networks (IJCNN)
- 2015
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Dropout: a simple way to prevent neural networks from overfitting
- Computer ScienceJ. Mach. Learn. Res.
- 2014
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
ESC: Dataset for Environmental Sound Classification
- Computer ScienceACM Multimedia
- 2015
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
Best practices for convolutional neural networks applied to visual document analysis
- Computer ScienceSeventh International Conference on Document Analysis and Recognition, 2003. Proceedings.
- 2003
A set of concrete bestpractices that document analysis researchers can use to get good results with neural networks, including a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.