General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
@inproceedings{Fonseca2018GeneralpurposeTO, title={General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline}, author={Eduardo Fonseca and Manoj Plakal and Frederic Font and Daniel P. W. Ellis and Xavier Favory and Jordi Pons and Xavier Serra}, booktitle={DCASE}, year={2018} }
This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology. We present the task, the dataset prepared for the competition, and a baseline…
96 Citations
Audio tagging with noisy labels and minimal supervision
- Computer ScienceDCASE
- 2019
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
DCASE 2018 task 2: iterative training, label smoothing, and background noise normalization for audio event tagging
- Computer ScienceDCASE
- 2018
This paper describes an approach from the submissions for DCASE 2018 Task 2: general-purpose audio tagging of Freesound content with AudioSet labels, and proposes to use pseudolabel for automatic label verification and label smoothing to reduce the over-fitting.
MULTI-LABEL AUDIO TAGGING SYSTEM FOR FREESOUND 2019 : FOCUSING ON NETWORK ARCHITECTURES , LABEL NOISY AND LOSS FUNCTIONS Technical Report
- Computer Science
- 2019
This technical report proposes the model architectures which can efficiently tag the audio with multi-label and noisy label based on convolutional network and recurrent network to unify detection of audio events.
The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging
- Computer ScienceDCASE
- 2018
A neural network system for DCASE 2018 task 2, general purpose audio tagging is presented, which out-performs the baseline result of 0.704 and achieves top 8% in the public leaderboard.
General-purpose audio tagging by ensembling convolutional neural networks based on multiple features
- Computer ScienceDCASE
- 2018
This paper describes an audio tagging system that participated in Task 2 “General-purpose audio tagging of Freesound content with AudioSet labels” of the “Detection and Classification of Acoustic…
General-purpose audio tagging from noisy labels using convolutional neural networks
- Computer ScienceDCASE
- 2018
A system using an ensemble of convolutional neural networks trained on log-scaled mel spectrograms to address general-purpose audio tagging challenges and to reduce the effects of label noise is proposed.
Meta learning based audio tagging
- Computer ScienceDCASE
- 2018
This paper describes the solution for the general-purpose audio tagging task, which belongs to one of the subtasks in the DCASE 2018 challenge, and proposes a meta learning-based ensemble method that can provide higher prediction accuracy and robustness with comparison to the single model.
Weakly Labelled AudioSet Tagging With Attention Neural Networks
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2019
This work bridges the connection between attention neural networks and multiple instance learning (MIL) methods, and proposes decision-level and feature-level attention neural Networks for audio tagging, which achieves a state-of-the-art mean average precision.
FSD50K: An Open Dataset of Human-Labeled Sound Events
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2022
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
General audio tagging with ensembling convolutional neural network and statistical features
- Computer ScienceThe Journal of the Acoustical Society of America
- 2019
An ensemble learning framework is applied to ensemble statistical features and the outputs from the deep classifiers, with the goal to utilize complementary information to address the noisy label problem within the framework.
References
SHOWING 1-10 OF 16 REFERENCES
Audio Set: An ontology and human-labeled dataset for audio events
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System
- Computer Science, PhysicsDCASE
- 2017
This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
DOMESTIC AUDIO TAGGING WITH CONVOLUTIONAL NEURAL NETWORKS
- Computer Science
- 2016
The use of convolutional neural networks (CNN) to label the audio signals recorded in a domestic (home) environment is investigated and a relative 23.8% improvement over the Gaussian mixture model (GMM) baseline method is observed over the development dataset for the challenge.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
- Computer ScienceISMIR
- 2017
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.
Chime-home: A dataset for sound source recognition in a domestic environment
- Computer Science2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2015
The annotation approach associates each 4-second excerpt from the audio recordings with multiple labels, based on a set of 7 labels associated with sound sources in the acoustic environment, to obtain a representation of `ground truth' in annotations.
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
A shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multilabel classification task and a symmetric or asymmetric deep denoising auto-encoder (syDAE or asyDAE) to generate new data-driven features from the logarithmic Mel-filter banks features.
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2018
The emergence of deep learning as the most popular classification method is observed, replacing the traditional approaches based on Gaussian mixture models and support vector machines.
Freesound technical demo
- Computer ScienceACM Multimedia
- 2013
This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.
SoundNet: Learning Sound Representations from Unlabeled Video
- Computer ScienceNIPS
- 2016
This work proposes a student-teacher training procedure which transfers discriminative visual knowledge from well established visual recognition models into the sound modality using unlabeled video as a bridge, and suggests some high-level semantics automatically emerge in the sound network, even though it is trained without ground truth labels.
CNN architectures for large-scale audio classification
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
This work uses various CNN architectures to classify the soundtracks of a dataset of 70M training videos with 30,871 video-level labels, and investigates varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on the authors' audio classification task, and larger training and label sets help up to a point.