• Corpus ID: 221089852

MULTI-TASK LEARNING AND POST PROCESSING OPTIMIZATION FOR SOUND EVENT DETECTION Technical Report

@inproceedings{Cances2019MULTITASKLA,
  title={MULTI-TASK LEARNING AND POST PROCESSING OPTIMIZATION FOR SOUND EVENT DETECTION Technical Report},
  author={L{\'e}o Cances and Thomas Pellegrini and Patrice Guyot},
  year={2019}
}
In this paper, we report our experiments in Sound Event Detection in domestic environments in the framework of the DCASE 2019 Task 4 challenge. The novelty, this year, lies in the availability of three different subsets for development: a weakly annotated dataset, a strongly annotated synthetic subset, and an unlabeled subset. The weak annotations, unlike the strong ones, provide tags from audio events but do not provide temporal boundaries. The task objective is twofold: detecting audio events… 

Figures and Tables from this paper

References

SHOWING 1-7 OF 7 REFERENCES
Evaluation of Post-Processing Algorithms for Polyphonic Sound Event Detection
TLDR
This paper evaluated different approaches for temporal segmentation, namely statistics-based and parametric methods and compared post-processing algorithms on the temporal prediction curves of two models: one based on the challenge’s baseline and onebased on Multiple Instance Learning (MIL).
Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments
TLDR
This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.
Audio Set: An ontology and human-labeled dataset for audio events
TLDR
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Unsupervised Representation Learning by Predicting Image Rotations
TLDR
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning.
Multitask Learning
  • R. Caruana
  • Computer Science
    Encyclopedia of Machine Learning and Data Mining
  • 1998
TLDR
Prior work on MTL is reviewed, new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals is presented, and new results for MTL with k-nearest neighbor and kernel regression are presented.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.
On genetic algorithms
TLDR
C Culling is near optimal for this problem, highly noise tolerant, and the best known a~~roach in some regimes, and some new large deviation bounds on this submartingale enable us to determine the running time of the algorithm.