• Corpus ID: 204854276

The effect of room acoustics on audio event classification

@inproceedings{Emmanouilidou2019TheEO,
  title={The effect of room acoustics on audio event classification},
  author={Dimitra Emmanouilidou and Hannes Gamper},
  year={2019}
}
The increasing availability of large-scale annotated databases, together with advances in data-driven learning and deep neural networks, have pushed the state of the art for computer-aided detection problems like audio scene analysis and event classification. However, the large variety of acoustic environments and their acoustic properties encountered in practice can pose a great challenge for such tasks and compromise the robustness of general-purpose classifiers when tested in unseen… 

Figures from this paper

Predicting Word Error Rate for Reverberant Speech
TLDR
The proposed non-intrusive CNN model outperforms C50-based WER prediction, indicating that WER can be estimated blindly, i.e., directly from the reverberant speech samples without knowledge of the acoustic parameters.
StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation
TLDR
Stochastic room impulse response generation method StoRIR, when used for audio data augmentation in a speech enhancement task, allows deep learning models to achieve better results on a wide range of metrics than when using the conventional image-source method.

References

SHOWING 1-10 OF 30 REFERENCES
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
TLDR
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Detection and classification of acoustic scenes and events: An IEEE AASP challenge
TLDR
An overview of systems submitted to the public evaluation challenge on acoustic scene classification and detection of sound events within a scene as well as a detailed evaluation of the results achieved by those systems are provided.
The ACE challenge — Corpus description and performance evaluation
TLDR
The Acoustic Characterization of Environments (ACE) Challenge is a competition to identify the most promising non-intrusive DRR and T60 estimation methods using real noisy reverberant speech.
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
TLDR
The REVERB challenge is described, which is an evaluation campaign that was designed to evaluate such speech enhancement and ASR techniques to reveal the state-of-the-art techniques and obtain new insights regarding potential future research directions.
Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes
TLDR
This work describes a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data and proposes methods to learn representations using this model which can be effectively used for solving the target task.
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Evaluation of Sound Event Detection, Classification and Localization in the Presence of Background Noise for Acoustic Surveillance of Hazardous Situations
TLDR
The classifier, based on a Support Vector Machine algorithm, and the sound source localization algorithm based on the analysis of multichannel signals from the Acoustic Vector Sensor are presented.
AclNet: efficient end-to-end audio classification CNN
TLDR
An efficient end-to-end convolutional neural network architecture, AclNet, for audio classification that achieves state-of-the-art performance on the ESC-50 corpus with 85:65% accuracy when trained with data augmentation and regularization, is proposed.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
ESC: Dataset for Environmental Sound Classification
TLDR
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
...
1
2
3
...