• Corpus ID: 53007193

DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System

@inproceedings{Mesaros2017DCASE2017CS,
  title={DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System},
  author={Annamaria Mesaros and Toni Heittola and Aleksandr Diment and Benjamin Elizalde and Ankit Shah and Emmanuel Vincent and Bhiksha Raj and Tuomas Virtanen},
  booktitle={DCASE},
  year={2017}
}
DCASE 2017 Challenge consists of four tasks: acoustic scene classification , detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ… 

Figures and Tables from this paper

Sound Event Detection in the DCASE 2017 Challenge
TLDR
Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance.
DCASE 2018 Challenge - Task 5: Monitoring of domestic activities based on multi-channel acoustics
TLDR
The setup of Task 5 is presented which includes the description of the task, dataset and the baseline system, which is intended to lower the hurdle to participate the challenge and to provide a reference performance.
THE SEIE-SCUT SYSTEMS FOR CHALLENGE ON DCASE 2018 : DEEP LEARNING TECHNIQUES FOR AUDIO REPRESENTATION AND CLASSIFICATION
TLDR
Evaluated on the development datasets of DCASE 2018, the systems presented are superior to the corresponding baselines for tasks 1b and 1a.
DEEP LEARNING FOR DCASE 2017 CHALLENGE
This paper reports our results on all tasks of DCASE challenge 2017 which are acoustic scene classification, detection of rare sound events, sound event detection in real life audio, and large-scale
DCASE 2018 Challenge baseline with convolutional neural networks
TLDR
Python implementation of DCASE 2018 has five tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio tagging; the baseline source code contains the implementation of convolutional neural networks, including AlexNetish and VGGish -- networks originating from computer vision.
Acoustic Scene Classification: An Overview of Dcase 2017 Challenge Entries
TLDR
Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel frequency representations in acoustic scene classification, and indicates that combinations of top systems are capable of reaching close to perfect performance on the given data.
AUDIO FEATURES IN A FUSION-BASED FRAMEWORK FOR ACOUSTIC SCENE CLASSIFICATION
TLDR
Two submissions for Acoustic Scene Classification (ASC) task of the IEEE AASP challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 are described, each based on an approach based on a score-level fusion of some well-known spectral features of audio processing.
THE SEIE-SCUT SYSTEMS FOR IEEE AASP CHALLENGE ON DCASE 2017 : DEEP LEARNING TECHNIQUES FOR AUDIO REPRESENTATION AND CLASSIFICATION
TLDR
Evaluated on the development datasets of DCASE 2017, the systems are superior to the corresponding baselines for tasks 1 and 2, and the system for task 3 performs as good as the baseline in terms of the predominant metrics.
DCASE 2018 Challenge Surrey cross-task convolutional neural network baseline
TLDR
A cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a “CNN Baseline” system that implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision.
CLASSIFYING SHORT ACOUSTIC SCENES WITH I-VECTORS AND CNNS : CHALLENGES AND OPTIMISATIONS FOR THE 2017 DCASE ASC TASK
TLDR
The result of the CP-JKU team’s experiments is a classification system that achieves classification accuracies of around 90% on the provided development data, as estimated via the prescribed four-fold cross-validation scheme.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording
TLDR
The work on Task 1 Acoustic Scene Classification and Task 3 Sound Event Detection in Real Life Recordings has low-level and high-level features, classifier optimization and other heuristics specific to each task.
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)
TLDR
This paper proposes an evolutionary approach to automatically generate a suitable neural network architecture and hyperparameters for any given classification problem and takes the DCASE 2018 Challenge as an opportunity to evaluate this approach.
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
An exemplar-based NMF approach to audio event detection
TLDR
A novel, exemplar-based method for audio event detection based on non-negative matrix factorisation, which model events as a linear combination of dictionary atoms, and mixtures as alinear combination of overlapping events.
Consumer-level multimedia event detection through unsupervised audio signal modeling
TLDR
A novel acoustic characterization approach to multimedia event detection (MED) task for unconstrained and unstructured consumer-level videos through audio signal modeling that better accounts for temporal dependencies than previously proposed MFCC bag-of-word approaches.
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
TLDR
A shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multilabel classification task and a symmetric or asymmetric deep denoising auto-encoder (syDAE or asyDAE) to generate new data-driven features from the logarithmic Mel-filter banks features.
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
TLDR
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations
TLDR
A method that bypasses the supervised construction of class models is presented, which learns the components as a non-negative dictionary in a coupled matrix factorization problem, where the spectral representation and the class activity annotation of the audio signal share the activation matrix.
Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments
TLDR
An efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training, which requires significantly less labeled instances.
Detecting audio events for semantic video search
TLDR
The experiments with SVM classifiers, and different features, using a 290-hour corpus of sound effects, which allowed us to build detectors for almost 50 semantic concepts, showed that the task is much harder in real-life videos, which so often include overlapping audio events.
...
...