Environmental sound classification with convolutional neural networks

  title={Environmental sound classification with convolutional neural networks},
  author={Karol J. Piczak},
  journal={2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)},
  • Karol J. Piczak
  • Published 12 November 2015
  • Computer Science
  • 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
This paper evaluates the potential of convolutional neural networks in classifying short audio clips of environmental sounds. [] Key Method A deep model consisting of 2 convolutional layers with max-pooling and 2 fully connected layers is trained on a low level representation of audio data (segmented spectrograms) with deltas. The accuracy of the network is evaluated on 3 public datasets of environmental and urban recordings. The model outperforms baseline implementations relying on mel-frequency cepstral…

Figures from this paper

Using deep convolutional neural network to classify urban sounds
This paper adopts an efficient convolutional network architecture for urban sound classification and conducts a series of experiments to verify whether the impact of time resolution index of input spectrogram exists and quantify the impact.
Deep Convolutional Neural Network with Mixup for Environmental Sound Classification
A novel deep convolutional neural network is proposed to be used for environmental sound classification (ESC) tasks that uses stacked Convolutional and pooling layers to extract high-level feature representations from spectrogram-like features.
Environmental Sounds Recognition with Convolutional-LSTM
This paper addresses the task of recognizing environmental sounds using the AudioSet data set. Specifically, features were extracted by spectrogram conversion of AudioSet's 10-second sound data, and
Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods
  • W. Oh
  • Computer Science
  • 2020
The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features and scaling, which is useful for classifying the environmental sounds included in the Urbansound8K.
Sound Classification Using Convolutional Neural Networks
This paper proposes a model which uses Convolutional Neural Networks (CNN) for classification of sound based on the spectrograms obtained for different sound samples which can be used for Deforestation detection, Gunshot detection in urban areas and also for detecting unusual sounds in streets like a cry for help, tyres screeching etc. at odd hours.
Deep Convolutional Neural Network with Transfer Learning for Environmental Sound Classification
A new convolutional neural network model based on Xception model which has a better performance on the JFT dataset is proposed and test results show that the proposed approach is with a betterperformance on the ESC accuracy.
Deep convolutional network for urbansound classification
The efficiency of Convolutional Neural Networks in classifying terse audio snippets of UrbanSounds is evaluated and the model obtained 76% validation accuracy that is better than other conventional models which relied only on Mel Frequency Cepstral Coefficients.
Acoustic scene classification using convolutional neural networks
The proposed CNN approach to acoustic scene classification is shown to outperform a Gaussian mixture model baseline for the DCASE 2016 database even though training data is sparse.
Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification
This work proposes an convolutional recurrent neural network model to learn spectro-temporal features and temporal correlations and extends this model with a frame-level attention mechanism to learn discriminative feature representations for environmental sound classification.
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
A novel end-to-end network called WaveMsNet is proposed based on the multi-scale convolution operation and two-phase method, which can get better audio representation by improving the frequency resolution and learning filters cross all frequency area.


Audio event classification using deep neural networks
It is shown that the DNN has some advantage over other classification methods and that fusion of two methods can produce the best results.
ESC: Dataset for Environmental Sound Classification
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
Audio-based Music Classification with a Pretrained Convolutional Network
A convolutional network is built that is then trained to perform artist recognition, genre recognition and key detection, and it is found that the Convolutional approach improves accuracy for the genre Recognition and artist recognition tasks.
Auditory Scene Classification with Deep Belief Network
This paper first creates a more compact and representative description of the input audio clip by focusing on the salient regions of data and modeling their contextual correlations, and exploits deep belief network to unsupervisedly discover and generate the high-level descriptions of scene audio.
Unsupervised feature learning for audio classification using convolutional deep belief networks
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning
Recognition of acoustic events using deep neural networks
For an acoustic event classification task containing 61 distinct classes, classification accuracy of the neural network classifier excels that of the conventional Gaussian mixture model based hidden Markov model classifier.
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram while
Deep convolutional neural networks for LVCSR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Convolutional Neural Networks for Speech Recognition
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Unsupervised feature learning for urban sound classification
  • J. Salamon, J. Bello
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
It is shown that feature learning can outperform the baseline approach by configuring it to capture the temporal dynamics of urban sources, and is evaluated on the largest public dataset of urban sound sources available for research, and compared to a baseline system based on MFCCs.