• Corpus ID: 40501028

Freesound Datasets: A Platform for the Creation of Open Audio Datasets

  title={Freesound Datasets: A Platform for the Creation of Open Audio Datasets},
  author={Eduardo Fonseca and Jordi Pons and Xavier Favory and Frederic Font and Dmitry Bogdanov and Andr{\'e}s Ferraro and Sergio Oramas and Alastair Porter and Xavier Serra},
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017. 

Figures and Tables from this paper

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
AudioCaps: Generating Captions for Audios in The Wild
A large-scale dataset of 46K audio clips with human-written text pairs collected via crowdsourcing on the AudioSet dataset is contributed and two novel components that help improve audio captioning performance are proposed: the top-down multi-scale encoder and aligned semantic attention.
ARCA23K: An audio dataset for investigating open-set label noise
It is shown that the majority of labelling errors in ARCA23K are due to out-of-vocabulary audio clips, and this type of label noise is referred to as open-set label noise.
DCASE 2018 task 2: iterative training, label smoothing, and background noise normalization for audio event tagging
This paper describes an approach from the submissions for DCASE 2018 Task 2: general-purpose audio tagging of Freesound content with AudioSet labels, and proposes to use pseudolabel for automatic label verification and label smoothing to reduce the over-fitting.
The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging
A neural network system for DCASE 2018 task 2, general purpose audio tagging is presented, which out-performs the baseline result of 0.704 and achieves top 8% in the public leaderboard.
Multichannel-based Learning for Audio Object Extraction
  • D. Arteaga, Jordi Pons
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
A novel deep learning approach to object extraction that learns from the multichannel renders of object-based productions, instead of directly learning from the audio objects themselves is proposed.
Audio tagging with noisy labels and minimal supervision
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
DALI: A Large Dataset of Synchronized Audio, Lyrics and notes, Automatically Created using Teacher-student Machine Learning Paradigm
DALI is introduced, a large and rich multimodal dataset containing 5358 audio tracks with their time-aligned vocal melody notes and lyrics at four levels of granularity and it is shown that this allows to progressively improve the performances of the SVD and get better audio-matching and alignment.
The Impact of Label Noise on a Music Tagger
It is shown that carefully annotated labels result in highest figures of merit, but even high amounts of noisy labels contain enough information for successful learning.
A number of neural network architectures that learn from log-mel spectrogram inputs are proposed that involve the use of preprocessing techniques, data augmentation, loss function weighting, and pseudo-labeling in order to improve their performance.


Good-sounds.org: A Framework to Explore Goodness in Instrumental Sounds
Comunicacio presentada a la 17th International Society for Music Information Retrieval Conference (ISMIR 2016), celebrada els dies 7 a 11 d'agost de 2016 a Nova York, EUA.
Essentia: An Audio Analysis Library for Music Information Retrieval
Comunicacio presentada a la 14th International Society for Music Information Retrieval Conference, celebrada a Curitiba (Brasil) els dies 4 a 8 de novembre de 2013.
An Open Dataset for Research on Audio Field Recording Archives: freefield1010
A free and open dataset of 7690 audio clips sampled from the field-recording tag in the Freesound audio archive is introduced, describing the data preparation process, characterise the dataset descriptively, and illustrate its use through an auto-tagging experiment.
Freesound technical demo
This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.
TagATune: A Game for Music and Sound Annotation
The rationale, design and preliminary results from a pilot study using a prototype of TagATune to label a subset of the FreeSound database, which aims to extract descriptions of sounds and music from human players.
A Survey of Evaluation in Music Genre Recognition
This paper compiles a bibliography of work in MGR, and analyzes three aspects of evaluation: experimental designs, datasets, and figures of merit.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
The Million Song Dataset
The Million Song Dataset, a freely-available collection of audio features and metadata for a million contemporary popular music tracks, is introduced and positive results on year prediction are shown, and the future development of the dataset is discussed.
ESC: Dataset for Environmental Sound Classification
A new annotated collection of 2000 short clips comprising 50 classes of various common sound events, and an abundant unified compilation of 250000 unlabeled auditory excerpts extracted from recordings available through the Freesound project are presented.
End-to-end learning for music audio
  • S. Dieleman, B. Schrauwen
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.