Metrics for Polyphonic Sound Event Detection

  title={Metrics for Polyphonic Sound Event Detection},
  author={Annamaria Mesaros and Toni Heittola and Tuomas Virtanen},
  journal={Applied Sciences},
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech… 

Figures and Tables from this paper

Trainable COPE Features for Sound Event Detection
A flexible system for the detection of audio events based on the use of trainable COPE (Combination of Peaks of Energy) features, which is flexible as new features can be easily added to the feature set.
Polyphonic Sound Event and Sound Activity Detection: A Multi-Task Approach
A joint model approach to improve the temporal localization of sound events using a multi-task learning setup and can alleviate False Positive (FP) and False Negative (FN) errors and improve both the segment-wise and the event-wise metrics.
Sound Event Envelope Estimation in Polyphonic Mixtures
This paper proposes to estimate the amplitude envelopes of target sound event classes in polyphonic mixtures, and shows that the envelope estimation allows good modeling of the sounds activity, with detection results comparable to current state of the art.
Using Sequential Information in Polyphonic Sound Event Detection
This paper proposes to use delayed predictions of event activities as additional input features that are fed back to the neural network, build N-grams to model the co-occurrence probabilities of different events, and use se-quentialloss to train neural networks.
A Comprehensive Review of Polyphonic Sound Event Detection
This paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations.
Augmented Strategy For Polyphonic Sound Event Detection
  • Bolun Wang, Zhonghua Fu, Hao Wu
  • Computer Science
    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2019
An augmented strategy for polyphonic sound event classification that includes data augmentation to enrich training set to eliminate data unbalance, a new loss function that combines cross entropy and F-score, and model fusion to integrate the powers of different classifiers is proposed.
Polyphonic Sound Event Detection with Weak Labeling
This thesis proposes to train deep learning models for SED using various levels of weak labeling, and shows that the sound events can be learned and localized by a recurrent neural network (RNN) with a connectionist temporal classification (CTC) output layer, which is well suited for sequential supervision.
A Framework for the Robust Evaluation of Sound Event Detection
A new framework for performance evaluation of polyphonic sound event detection (SED) systems is defined, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates and introduces a definition of event detection that is more robust against labelling subjectivity.
Polyphonic Sound Event Tracking Using Linear Dynamical Systems
The proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection and tracking and is modeled around a four-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state.
Duration-Controlled LSTM for Polyphonic Sound Event Detection
This paper builds upon a state-of-the-art SED method that performs frame-by-frame detection using a bidirectional LSTM recurrent neural network, and incorporates a duration-controlled modeling technique based on a hidden semi-Markov model that makes it possible to model the duration of each sound event precisely and to perform sequence- by-sequence detection without having to resort to thresholding.


Acoustic event detection for multiple overlapping similar sources
  • D. Stowell, David Clayton
  • Physics
    2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
  • 2015
A simple method modelling the onsets, durations and offsets of acoustic events to avoid intrinsic limits on polyphony or on inter-event temporal patterns is introduced and evaluated in a case study with over 3000 zebra finch calls.
Context-dependent sound event detection
The two-step approach was found to improve the results substantially compared to the context-independent baseline system, and the detection accuracy can be almost doubled by using the proposed context-dependent event detection.
Acoustic event detection in real life recordings
A system for acoustic event detection in recordings from real life environments using a network of hidden Markov models, capable of recognizing almost one third of the events, and the temporal positioning of the Events is not correct for 84% of the time.
Events Detection for an Audio-Based Surveillance System
The automatic shot detection system presented is based on a novelty detection approach which offers a solution to detect abnormality (abnormal audio events) in continuous audio recordings of public places and takes advantage of potential similarity between the acoustic signatures of the different types of weapons by building a hierarchical classification system.
Reliable detection of audio events in highly noisy environments
Polyphonic sound event detection using multi label deep neural networks
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Supervised model training for overlapping sound events based on unsupervised source separation
Two iterative approaches based on EM algorithm to select the most likely stream to contain the target sound to give a reasonable increase of 8 percentage units in the detection accuracy are proposed.
Real-world acoustic event detection
Acoustic Event Detection and Classification
The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the
Sound Event Recognition With Probabilistic Distance SVMs
  • H. D. Tran, Haizhou Li
  • Computer Science
    IEEE Transactions on Audio, Speech, and Language Processing
  • 2011
The results show that the proposed classification method significantly outperforms conventional SVM classifiers with Mel-frequency cepstral coefficients (MFCCs) and makes the proposed method an obvious choice for online sound event recognition.