Context-dependent sound event detection

  title={Context-dependent sound event detection},
  author={Toni Heittola and Annamaria Mesaros and Antti J. Eronen and Tuomas Virtanen},
  journal={EURASIP Journal on Audio, Speech, and Music Processing},
AbstractThe work presented in this article studies how the context information can be used in the automatic sound event detection process, and how the detection system can benefit from such information. Humans are using context information to make more accurate predictions about the sound events and ruling out unlikely events given the context. We propose a similar utilization of context information in the automatic sound event detection process. The proposed approach is composed of two stages… 
Detection of overlapping acoustic events using a temporally-constrained probabilistic model
Results show that the proposed system outperforms several state-of-the-art methods for overlapping acoustic event detection on the same task, using both frame-based and event-based metrics, and is robust to varying event density and noise levels.
Polyphonic Sound Event Detection with Weak Labeling
This thesis proposes to train deep learning models for SED using various levels of weak labeling, and shows that the sound events can be learned and localized by a recurrent neural network (RNN) with a connectionist temporal classification (CTC) output layer, which is well suited for sequential supervision.
Duration-Controlled LSTM for Polyphonic Sound Event Detection
This paper builds upon a state-of-the-art SED method that performs frame-by-frame detection using a bidirectional LSTM recurrent neural network, and incorporates a duration-controlled modeling technique based on a hidden semi-Markov model that makes it possible to model the duration of each sound event precisely and to perform sequence- by-sequence detection without having to resort to thresholding.
A first attempt at polyphonic sound event detection using connectionist temporal classification
  • Yun Wang, Florian Metze
  • Computer Science
    2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
This paper presents a first attempt at using Connectionist temporal classification (CTC) for sound event detection, and shows that CTC is able to locate the boundaries of sound events on a very noisy corpus of consumer generated content with rough hints about their positions.
Acoustic Event Detection in Speech Overlapping Scenarios Based on High-Resolution Spectral Input and Deep Learning
A method that learns features in an unsupervised manner from high-resolution spectrogram patches, and integrates within the deep neural network framework to detect and classify acoustic events.
Weakly labeled acoustic event detection using local detector and global classifier
This paper proposes an acoustic event detection framework for weakly supervised data which is labeled with only the existence of events and shows that the proposed model has a lower EER on the CHiME Home dataset than other neural network based models.
Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection
Sound event detection (SED) has gained increasing attention with its wide application in surveillance, video indexing, etc. Existing models in SED mainly generate frame-level predictions, converting
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Polyphonic Sound Event Tracking Using Linear Dynamical Systems
The proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection and tracking and is modeled around a four-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state.
Multi-view representation for sound event recognition
The proposed MVR approach is significantly better than the other approaches proposed in the recent literature and is evaluated on three benchmark sound event datasets namely ESC-50, DCASE2016 Task 2, and DCASE2018 Task 2 for the SER task.


Latent semantic analysis in sound event detection
Use of probabilistic latent semantic analysis for modeling co-occurrence of overlapping sound events in audio recordings from everyday audio environments such as office, street or shop provides an increase of event detection accuracy to 35%, compared to 30% for using uniform priors for the events.
Acoustic event detection in real life recordings
A system for acoustic event detection in recordings from real life environments using a network of hidden Markov models, capable of recognizing almost one third of the events, and the temporal positioning of the Events is not correct for 84% of the time.
Real-world acoustic event detection
Sound Event Detection in Multisource Environments Using Source Separation
This paper proposes a sound event detection system for natural multisource environments, using a sound source separation front-end, with a significant increase in event detection accuracy compared to a system able to output a single sequence of events.
Audio context recognition using audio event histograms
This paper presents a method for audio context recognition, meaning classification between everyday environments. The method is based on representing each audio context using a histogram of audio
Events Detection for an Audio-Based Surveillance System
The automatic shot detection system presented is based on a novelty detection approach which offers a solution to detect abnormality (abnormal audio events) in continuous audio recordings of public places and takes advantage of potential similarity between the acoustic signatures of the different types of weapons by building a hierarchical classification system.
Disambiguating Sounds through Context
It is shown that the use of knowledge in a dynamic network model can improve automatic sound identification, by reducing the search space of the low-level audio features.
HMM-Based Acoustic Event Detection with AdaBoost Feature Selection
This work proposes using Kullback-Leibler distance to quantify the discriminant capability of all speech feature components in acoustic event detection and uses AdaBoost to select a discriminant feature set that outperforms classical speech feature set such as MFCC in one-pass HMM-basedoustic event detection.
Acoustic Event Detection and Classification
The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the
Text-Like Segmentation of General Audio for Content-Based Retrieval
Experimental evaluation performed on a representative data set consisting of 5 h of diverse audio data streams indicated that the proposed approach is more effective than the traditional low-level feature-based approaches in solving the posed audio scene segmentation problem.