• Corpus ID: 53007193

DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System

@inproceedings{Mesaros2017DCASE2017CS,
  title={DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System},
  author={Annamaria Mesaros and Toni Heittola and Aleksandr Diment and Benjamin Elizalde and Ankit Shah and Emmanuel Vincent and Bhiksha Raj and Tuomas Virtanen},
  booktitle={DCASE},
  year={2017}
}
DCASE 2017 Challenge consists of four tasks: acoustic scene classification , detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ… 

Figures and Tables from this paper

Sound Event Detection in the DCASE 2017 Challenge
TLDR
Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance.
DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics
TLDR
The setup of Task 5 is presented which includes the description of the task, dataset and the baseline system, which is intended to lower the hurdle to participate the challenge and to provide a reference performance.
Acoustic Scene Classification: An Overview of Dcase 2017 Challenge Entries
TLDR
Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel frequency representations in acoustic scene classification, and indicates that combinations of top systems are capable of reaching close to perfect performance on the given data.
THE SEIE-SCUT SYSTEMS FOR IEEE AASP CHALLENGE ON DCASE 2017 : DEEP LEARNING TECHNIQUES FOR AUDIO REPRESENTATION AND CLASSIFICATION
TLDR
Evaluated on the development datasets of DCASE 2017, the systems are superior to the corresponding baselines for tasks 1 and 2, and the system for task 3 performs as good as the baseline in terms of the predominant metrics.
DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline
TLDR
A cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a “CNN Baseline” system that implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision.
CLASSIFYING SHORT ACOUSTIC SCENES WITH I-VECTORS AND CNNS : CHALLENGES AND OPTIMISATIONS FOR THE 2017 DCASE ASC TASK
TLDR
The result of the CP-JKU team’s experiments is a classification system that achieves classification accuracies of around 90% on the provided development data, as estimated via the prescribed four-fold cross-validation scheme.
Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
TLDR
This paper proposes generic cross-task baseline systems based on convolutional neural networks (CNNs) and finds that the 9-layer CNN with average pooling is a good model for a majority of the DCASE 2019 tasks.
Weighted and Multi-Task Loss for Rare Audio Event Detection
TLDR
Two loss functions tailored for rare audio event detection in audio streams are presented, designed to tackle the common issue of imbalanced data in background/foreground classification and the multi-task loss enables the networks to simultaneously model the class distribution and the temporal structures of the target events for recognition.
WAVELET-BASED AUDIO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Technical Report
TLDR
Two wavelet-based features in a scorefusion framework are found to be complementary so that the fused system relatively outperforms the deep-learning based baseline system with the development dataset provided for the respective sub-tasks.
Evaluation of Modulation-MFCC Features and DNN Classification for Acoustic Event Detection
TLDR
Traditional techniques and different deep learning architectures are used, including convolutional and recurrent models in the context of real life everyday audio recordings in realistic, however challenging, multisource conditions.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording
TLDR
The work on Task 1 Acoustic Scene Classification and Task 3 Sound Event Detection in Real Life Recordings has low-level and high-level features, classifier optimization and other heuristics specific to each task.
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)
TLDR
This paper proposes an evolutionary approach to automatically generate a suitable neural network architecture and hyperparameters for any given classification problem and takes the DCASE 2018 Challenge as an opportunity to evaluate this approach.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Consumer-level multimedia event detection through unsupervised audio signal modeling
TLDR
A novel acoustic characterization approach to multimedia event detection (MED) task for unconstrained and unstructured consumer-level videos through audio signal modeling that better accounts for temporal dependencies than previously proposed MFCC bag-of-word approaches.
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
TLDR
A shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multilabel classification task and a symmetric or asymmetric deep denoising auto-encoder (syDAE or asyDAE) to generate new data-driven features from the logarithmic Mel-filter banks features.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations
TLDR
A method that bypasses the supervised construction of class models is presented, which learns the components as a non-negative dictionary in a coupled matrix factorization problem, where the spectral representation and the class activity annotation of the audio signal share the activation matrix.
Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments
TLDR
An efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training, which requires significantly less labeled instances.
Detecting audio events for semantic video search
TLDR
The experiments with SVM classifiers, and different features, using a 290-hour corpus of sound effects, which allowed us to build detectors for almost 50 semantic concepts, showed that the task is much harder in real-life videos, which so often include overlapping audio events.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
...
...