Active Learning for Sound Event Detection

  title={Active Learning for Sound Event Detection},
  author={Zhao Shuyang and Toni Heittola and Tuomas Virtanen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
This article proposes an active learning system for sound event detection (SED). It aims at maximizing the accuracy of a learned SED model with limited annotation effort. The proposed system analyzes an initially unlabeled audio dataset, from which it selects sound segments for manual annotation. The candidate segments are generated based on a proposed change point detection approach, and the selection is based on the principle of mismatch-first farthest-traversal. During the training of SED… 

Figures and Tables from this paper

Visual Active Learning for Labeling: A Case for Soundscape Ecology Data

The development of a methodology and a framework to support labeling, with an application case as background, is reported, which performs visual active learning and label propagation with 2D embeddings as layouts to achieve faster and interactive labeling of samples.

Active Learning with Positive and Negative Pairwise Feedback

A generic framework for active clustering with queries for pairwise similarities between objects, which can be any positive or negative number, yielding full flexibility in the type of feedback that a user/annotator can provide.

Automatic Behavior Assessment from Uncontrolled Everyday Audio Recordings by Deep Learning

This work investigates whether social behavior and environments can automatically be coded based on uncontrolled everyday audio recordings by applying deep learning, and suggests that certain aspects of social behaviorand environments can be automatically classified.

Beyond Sensor Data Analysis: Unexpected Challenges in a Honeybee Monitoring Project

The challenges and problems which arose in the honeybee monitoring project are collected and summarized to serve as an aid for planning and executing data science projects in the agricultural domain.

Latent Neural Stochastic Differential Equations for Change Point Detection

This paper introduces an application of latent neural stochastic differential equations for change point detection problem and demonstrates the detection capabilities and performance of the model on a range of synthetic and real-world datasets and benchmarks.

Sound Event Detection for Human Safety and Security in Noisy Environments

A sound anomaly detection system based on a fully convolutional network which exploits image spatial filtering and an Atrous Spatial Pyramid Pooling module and outperforms both state-of-the-art methods and general purpose deep learning-solutions.

Human–machine collaboration based sound event detection

This paper proposes an approach of human–machine collaboration based SED (HMSED), and uses a group of median filters with adaptive window size in the post-processing of output probabilities of the model.

FSD50K: An Open Dataset of Human-Labeled Sound Events

FSD50K is introduced, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology, to provide an alternative benchmark dataset and thus foster SER research.

Improving the quality control of seismic data through active learning

A novel active learning methodology to sequentially select the most relevant data, which are then given back to a human expert for labeling, supported by strong empirical evidence as illustrated by the numerical experiments presented in this article.

Iterative weighted active transfer learning hyperspectral image classification

This work combines active learning (AL) and transfer learning and proposed an iterative weighted framework based on active transfer learning to solve the optimal reconstruction matrix and projection matrix by minimizing the reconstruction error.



Müller cell metabolic chaos during retinal degeneration.

Enzyme‐linked immunosorbent assay for human autoantibody to glial fibrillary acidic protein: higher titer of the antibody is detected in serum of patients with Alzheimer's disease

An enzyme‐linked immunosorbent assay (ELISA) to detect anti‐glial fibrillary acidic protein (GFAP) autoantibody in human sera is developed and it is suggested that the evaluation of the anti‐GFAP autoantIBody level may be useful in diagnosing Alzheimer's disease.

Comparative Sequence Analysis of the tuf and recA Genes and Restriction Fragment Length Polymorphism of the Internal Transcribed Spacer Region Sequences Supply Additional Tools for Discriminating Bifidobacterium lactis from Bifidobacterium animalis

The bifidobacterial strains investigated could be divided into two distinct groups within a single species based on the tuf, recA, and 16S-23S spacer region sequence analysis and could be unified as the species B. animalis.

Polyhydroxyalkanoates as a source of chemicals, polymers, and biofuels.

Composite Materials Based on EN AW-Al Cu4Mg1(A) Aluminum Alloy Reinforced with the Ti(C,N) Ceramic Particles

Investigations of composite materials based on EN AW-Al Cu4Mg1(A) aluminum alloy reinforced with the Ti(C,N) particles with various weight ratios of 5, 10, and 15% are presented. Powders of the

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network

In this paper, we present a gated convolutional neural network and a temporal attention-based localization method for audio classification, which won the 1st place in the large-scale weakly

Audio Set: An ontology and human-labeled dataset for audio events

The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.

An Active Learning Method Using Clustering and Committee-Based Sample Selection for Sound Event Classification

The proposed method performs K-medoids clustering over an initially unlabeled dataset, and medoids as local representatives, are presented to an annotator for manual annotation, and outperforms other active learning algorithms proposed for sound event classification through all the experiments.

Weakly Labelled AudioSet Tagging With Attention Neural Networks

This work bridges the connection between attention neural networks and multiple instance learning (MIL) methods, and proposes decision-level and feature-level attention neural Networks for audio tagging, which achieves a state-of-the-art mean average precision.

Synovial chondromatosis of the temporomandibular joint extending to temporalis, masticator, and parotid spaces.

A case of synovial chondromatosis of the TMJ with extraarticular extension that was diagnosed with MRI and CT and Histopathologic evaluation indicated that this case was synovials in intermediate phase.