• Corpus ID: 31638994

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database

  title={The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database},
  author={Emiel van Miltenburg and Benjamin Timmermans and Lora Aroyo},
This paper presents a collection of annotations (tags or keywords) for a set of 2,133 environmental sounds taken from the Freesound database (www.freesound.org). The annotations are acquired through an open-ended crowd-labeling task, in which participants were asked to provide keywords for each of three sounds. The main goal of this study is to find out (i) whether it’s feasible to collect keywords for a large collection of sounds through crowdsourcing, and (ii) how people talk about sounds… 

Figures and Tables from this paper

A Case for a Range of Acceptable Annotations
It is proposed that there exists a class of annotations between these two categories that exhibit acceptable variation, which is defined as the range of annotations for a given item that meet the standard of quality for a task.
Exploiting Disagreement Through Open-Ended Tasks for Capturing Interpretation Spaces
This research investigates how the complete interpretation space of humans about the content and context of this data can be captured, using open-ended crowdsourcing tasks that optimize the capturing of multiple interpretations combined with disagreement based metrics for evaluation of the results.
1 Crowdsourcing Ambiguity-Aware Ground Truth
It is proved that capturing disagreement is essential for acquiring a high quality ground truth by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, a method which enforces consensus among annotators.
Empirical Methodology for Crowdsourcing Ground Truth
This work shows that measuring disagreement is essential for acquiring a high quality ground truth, by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical Relation Extraction, Twitter Event Identification, News Event Extraction and Sound Interpretation.
An Overview on Audio, Signal, Speech, & Language Processing for COVID-19
The contributions from non-speech modalities that may complement or serve as inspiration for audio and speech analysis and the observations with respect to solution usability, challenges, and the significant technology achievements are discussed.
Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview
An overview of the speech and other audio signal, language and general signal processing-based work done using Artificial Intelligence techniques to screen, diagnose, monitor, and spread the awareness about COVID-19 is given.
Freesound Datasets: A Platform for the Creation of Open Audio Datasets
Comunicacio presentada al 18th International Society for Music Information Retrieval Conference celebrada a Suzhou, Xina, del 23 al 27 d'cotubre de 2017.
Pragmatic descriptions of perceptual stimuli
A general model of the human image description process is presented, and a road map for future research in automatic image description, and the automatic description of perceptual stimuli in general is proposed.


Sound-based distributional models
The first results of the efforts to build a perceptually grounded semantic model based on sound data collected from freesound.org show that the models are able to capture semantic relatedness, with the tag- based model scoring higher than the sound-based model and the combined model.
The Three Sides of CrowdTruth
This paper investigates the dependence of worker metrics for detecting spam on the quality of sentences in the dataset, and thequality of the target semantics, and shows that worker quality metrics can improve significantly when theQuality of these other aspects of semantic interpretation are considered.
CrowdTruth: Machine-Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data
This paper introduces the CrowdTruth open-source software framework for machine-human computation, that implements a novel approach to gathering human annotation data for a variety of media, and shows the advantages of using open standards and the extensibility of the framework with new data modalities and annotation tasks.
Multimodal Distributional Semantics
This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
ImageNet: A large-scale hierarchical image database
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
From Frequency to Meaning: Vector Space Models of Semantics
The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
Labeling images with a computer game
A new interactive system: a game that is fun and can be used to create valuable output that addresses the image-labeling problem and encourages people to do the work by taking advantage of their desire to be entertained.
Freesound technical demo
This demo wants to introduce Freesound to the multimedia community and show its potential as a research resource.
NESSTI: Norms for Environmental Sound Stimuli
Normative data along multiple cognitive and affective variable dimensions for a set of 110 sounds, including living and manmade stimuli are provided to assist researchers in the fields of cognitive and clinical psychology and the neuroimaging community in choosing well-controlled environmental sound stimuli, and allow comparison across multiple studies.