Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
@article{Huzaifah2017ComparisonOT, title={Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks}, author={Muhammad Huzaifah}, journal={ArXiv}, year={2017}, volume={abs/1706.07156} }
Recent successful applications of convolutional neural networks (CNNs) to audio classification and speech recognition have motivated the search for better input representations for more efficient training. [] Key Result Additionally, we observe that the optimal window size during transformation is dependent on the characteristics of the audio signal and architecturally, 2D convolution yielded better results in most cases compared to 1D.
93 Citations
Environmental Sound Classification with Parallel Temporal-Spectral Attention
- Computer Science, Environmental ScienceINTERSPEECH
- 2020
A novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations is proposed, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands.
Audio representation for environmental sound classification using convolutional neural networks
- Computer Science
- 2018
A convolutional neural network (CNN) training framework is described and implemented and it is shown that the model is relatively robust against wind-noise, the accuracy remains above 60\% until the SNR between signal and wind- noise approaches 9 dB.
Environment Sound Classification using Multiple Feature Channels and Deep Convolutional Neural Networks
- Computer ScienceArXiv
- 2019
To the best of the knowledge, this is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets and by a considerable margin over the previous models.
Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network
- Computer ScienceINTERSPEECH
- 2020
This is the first time that a single environment sound classification model is able to achieve state-of-the-art results on all three datasets, and the accuracy achieved by the proposed model is beyond human accuracy.
Proceedings of the Detection and Classification of Acoustic Scenes and
Events 2019 Workshop (DCASE2019)
- Computer Science
- 2016
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
An Ensemble Stacked Convolutional Neural Network Model for Environmental Event Sound Recognition
- Computer ScienceApplied Sciences
- 2018
A novel stacked CNN model with multiple convolutional layers of decreasing filter sizes is proposed to improve the performance of CNN models with either log-mel feature input or raw waveform input to build the ensemble DS-CNN model for ESC.
Multi-stream Network With Temporal Attention For Environmental Sound Classification
- Computer ScienceINTERSPEECH
- 2019
This work introduces a multi-stream convolutional neural network with temporal attention that addresses problems of environmental sound classification systems and achieves new state-of-the-art performance without any changes in network architecture or front-end preprocessing, thus demonstrating better generalizability.
A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone
- Computer ScienceIEEE Access
- 2019
A Speech Enhancement (SE) technique based on multi-objective learning convolutional neural network to improve the overall quality of speech perceived by Hearing Aid (HA) users is presented.
CNN and Sound Processing-Based Audio Classifier for Alarm Sound Detection
- Computer Science
- 2020
Artificial neural networks (ANN) has evolved through many stages in the last three decades with many researchers contributing in this challenging field. With the power of math, complex problems can…
Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation
- Computer ScienceArXiv
- 2021
This work proposes a novel PSED framework, which incorporates MultiType-Multi-Scale TFRs, and applies a novel approach, to adaptively fuse different models and T FRs symbiotically, so that the overall performance can be significantly improved.
References
SHOWING 1-10 OF 33 REFERENCES
Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2011
The results of the numerical simulation support the effectiveness of the proposed approach for environmental audio classification with over 10% accuracy-rate improvement compared to the MFCC features.
Environmental Sound Recognition With Time–Frequency Audio Features
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2009
An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
- Computer ScienceIEEE Signal Processing Letters
- 2017
It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.
Environmental sound classification with convolutional neural networks
- Computer Science2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
Convolutional Neural Networks for Speech Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2014
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Deep Convolutional Neural Networks for Large-scale Speech Tasks
- Computer ScienceNeural Networks
- 2015
Very short time environmental sound classification based on spectrogram pattern matching
- Computer ScienceInf. Sci.
- 2013
Deep convolutional neural networks for LVCSR
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram while…
Unsupervised feature learning for audio classification using convolutional deep belief networks
- Computer ScienceNIPS
- 2009
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning…