Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

@article{Mesaros2018DetectionAC,
  title={Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge},
  author={Annamaria Mesaros and Toni Heittola and Emmanouil Benetos and Peter Foster and Mathieu Lagrange and Tuomas Virtanen and Mark D. Plumbley},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2018},
  volume={26},
  pages={379-393}
}
Public evaluation campaigns and datasets promote active development in target research areas, allowing direct comparison of algorithms. [] Key Result The datasets created for and used in DCASE 2016 are publicly available and are a valuable resource for further research.
Sound Event Detection in the DCASE 2017 Challenge
TLDR
Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance.
The effect of room acoustics on audio event classification
TLDR
The impact of mismatches between training and testing conditions in terms of acoustical parameters, including the reverberation time (T60) and the direct-to-reverberant ratio (DRR), on audio classification accuracy and class separability is studied.
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
TLDR
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
A Review of Deep Learning Based Methods for Acoustic Scene Classification
TLDR
This article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation, and for data modeling, i.
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation
TLDR
This technical report presents a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge, proposing a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks.
Open-Set Acoustic Scene Classification with Deep Convolutional Autoencoders
TLDR
This paper contains a description of an open-set acoustic scene classification system submitted to task 1C of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019, which consists of a combination of convolutional neural networks for closed-set identification and deep convolved autoencoders for outlier detection.
TASK 3 DCASE 2020: SOUND EVENT LOCALIZATION AND DETECTION USING RESIDUAL SQUEEZE-EXCITATION CNNS Technical Report
TLDR
This work aims to improve the accuracy results of the baseline CRNN by adding residual squeeze-excitation blocks in the convolutional part of the CRNN, and shows that by simply introducing the residual SE blocks, the results obtained in the development phase clearly exceed the baseline.
A multi-device dataset for urban acoustic scene classification
TLDR
The acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task are introduced, and the performance of a baseline system in the task is evaluated.
DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics
TLDR
The setup of Task 5 is presented which includes the description of the task, dataset and the baseline system, which is intended to lower the hurdle to participate the challenge and to provide a reference performance.
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification
TLDR
A new type of pooling layer is proposed aimed at compensating non-relevant information of audio events by applying an adaptive transformation of the convolutional feature maps in the temporal axis that follows a uniform distance subsampling criterion on the learned feature space.
...
...

References

SHOWING 1-10 OF 70 REFERENCES
Sound event detection in synthetic audio: Analysis of the dcase 2016 task results
TLDR
This task, which follows the ‘Event Detection-Office Synthetic’ task of DCASE 2013, studies the behaviour of tested algorithms when facing controlled levels of audio complexity with respect to background noise and polyphony/density.
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Budapest, Hungary, 3 Sep 2016.
TLDR
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks
TLDR
This workshop paper presents the use of a convolutional neural network trained to classify short sequences of audio, represented by their log-mel spectrogram, and proposes a training method that can be used when the system validation performance saturates as the training proceeds.
Exploiting spectro-temporal locality in deep learning based acoustic event detection
TLDR
Two different feature extraction strategies are explored using multiple resolution spectrograms simultaneously and analyzing the overall and event-wise influence to combine the results, and the use of convolutional neural networks (CNN), a state of the art 2D feature extraction model that exploits local structures, with log power spectrogram input for AED.
CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS
TLDR
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team and proposes a novel i-vector extraction scheme for ASC using both left and right audio channels and a Deep Convolutional Neural Network architecture trained on spectrograms of audio excerpts in end-to-end fashion.
CQT-based Convolutional Neural Networks for Audio Scene Classification
TLDR
It is shown in this paper that a ConstantQ-transformed input to a Convolutional Neural Network improves results and a parallel (graphbased) neural network architecture is proposed which captures relevant audio characteristics both in time and in frequency.
Assessment of human and machine performance in acoustic scene classification: Dcase 2016 case study
TLDR
Human and machine performance in acoustic scene classification is examined through a parallel experiment using TUT Acoustic Scenes 2016 dataset, and an expert listener trained for the task obtained similar accuracy to the average of submitted systems.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
ACOUSTIC SCENE CLASSIFICATION USING PARALLEL COMBINATION OF LSTM AND CNN
TLDR
This paper proposes a neural network architecture for the purpose of using sequential information that is composed of two separated lower networks and one upper network and refers to these as LSTM layers, CNN layers and connected layers, respectively.
Score Fusion of Classification Systems for Acoustic Scene Classification
TLDR
This study explores several methods in three aspects; feature extraction, generative/discriminative machine learning, and score fusion for final decision on the acoustic scene classification task of the IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events.
...
...