• Corpus ID: 53768595

Facilitating the Manual Annotation of Sounds When Using Large Taxonomies

  title={Facilitating the Manual Annotation of Sounds When Using Large Taxonomies},
  author={Xavier Favory and Eduardo Fonseca and Frederic Font and Xavier Serra},
Properly annotated multimedia content is crucial for supporting advances in many Information Retrieval applications. It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections. In the context of everyday sounds and online collections, the content to describe is very diverse and involves many different types of concepts, often organised in large hierarchical structures called taxonomies. This makes the task of manually annotating… 

Figures from this paper

FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.
Voice-based interface for accessible soundscape composition: composing soundscapes by vocally querying online sounds repositories
An Internet of Audio Things ecosystem devised to support soundscape composition via vocal interactions that involves a commercial voice-based interface and the cloud-based repository of audio content Freesound.org is presented.
Audio tagging with noisy labels and minimal supervision
This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.
COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations
The results are promising, sometimes in par with the state-of-the-art in the considered tasks and the embeddings produced with the method are well correlated with some acoustic descriptors.


Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
HT06, tagging paper, taxonomy, Flickr, academic article, to read
A model of tagging systems, specifically in the context of web-based systems, is offered to help illustrate the possible benefits of these tools and a simple taxonomy of incentives and contribution models is provided to inform potential evaluative frameworks.
A taxonomy of musical genres
This work describes a novel music genre taxonomy based on a few guiding principles, and reports on the process of building this taxonomy.
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
A Dataset and Taxonomy for Urban Sound Research
A taxonomy of urban sounds and a new dataset, UrbanSound, containing 27 hours of audio with 18.5 hours of annotated sound event occurrences across 10 sound classes are presented.
Freesound technical demo
This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.
What in the World Do We Hear? An Ecological Approach to Auditory Event Perception
Everyday listening is the experience of hearing events in the world rather than sounds per se. In this article, I take an ecological approach to everyday listening to overcome constraints on its
The Soundscape: Our Sonic Environment and the Tuning of the World
Schafer advocates soundscape study, or interdisciplinary research on the sonic environment that combines science, society, and the arts. Extensively quoting literature, he gives an historical
ImageNet Large Scale Visual Recognition Challenge
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.