• Corpus ID: 232320688

GISE-51: A scalable isolated sound events dataset

  title={GISE-51: A scalable isolated sound events dataset},
  author={Sarthak Yadav and Mary Ellen Foster},
Most of the existing isolated sound event datasets consist of a small number of sound event classes, usually 10 to 15, restricted to a small domain, such as domestic and urban sound events. In this work, we introduce GISE-51, a dataset spanning 51 isolated sound events belonging to a broad domain of event types, derived from FSD50K [1]. We also release GISE-51-Mixtures, a dataset of 5-second soundscapes with hard-labelled event boundaries synthesized from GISE-51 isolated sound events. We… 

Figures and Tables from this paper

Introducing the ReaLISED Dataset for Sound Event Classification

This paper presents the Real-Life Indoor Sound Event Dataset (ReaLISED), a new database which has been developed to contribute to the scientific advance by providing a large amount of real labeled

How Much Attention Should we Pay to Mosquitoes?

This paper proposes a novel data-driven approach that leverages Transformer-based models for the identification of mosquitoes in audio recordings and shows that this approach is able to outperform baseline methods using standard evaluation metrics, albeit suffering from unexpectedly high false negatives detection rates.



Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments

This paper presents DCASE 2018 task 4.0, which evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries) and explores the possibility to exploit a large amount of unbalanced and unlabeled training data together with a small weakly labeling training set to improve system performance.

Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis

The paper introduces Domestic Environment Sound Event Detection (DESED) dataset mixing a part of last year dataset and an additional synthetic, strongly labeled, dataset provided this year that’s described more in detail.

FSD50K: An Open Dataset of Human-Labeled Sound Events

FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.

Audio Set: An ontology and human-labeled dataset for audio events

The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.

Scaper: A library for soundscape synthesis and augmentation

Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, “specification”, to increase the variability of the output.

Recurrent neural networks for polyphonic sound event detection in real life recordings

In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition

It is harder to learn sounds in adverse situations such as from weakly labeled and/or noisy labeled data, and in these situations a single stage of learning is not sufficient, so a sequential stage-wise learning process is proposed that improves generalization capabilities of a given modeling system.

Audio tagging with noisy labels and minimal supervision

This paper presents the task setup, the FSDKaggle2019 dataset prepared for this scientific evaluation, and a baseline system consisting of a convolutional neural network.