Metrics for Polyphonic Sound Event Detection
@article{Mesaros2016MetricsFP, title={Metrics for Polyphonic Sound Event Detection}, author={Annamaria Mesaros and Toni Heittola and Tuomas Virtanen}, journal={Applied Sciences}, year={2016}, volume={6}, pages={162} }
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech…
Figures and Tables from this paper
383 Citations
Trainable COPE Features for Sound Event Detection
- Computer ScienceCIARP
- 2019
A flexible system for the detection of audio events based on the use of trainable COPE (Combination of Peaks of Energy) features, which is flexible as new features can be easily added to the feature set.
Polyphonic Sound Event and Sound Activity Detection: A Multi-Task Approach
- Computer Science2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2019
A joint model approach to improve the temporal localization of sound events using a multi-task learning setup and can alleviate False Positive (FP) and False Negative (FN) errors and improve both the segment-wise and the event-wise metrics.
Sound Event Envelope Estimation in Polyphonic Mixtures
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
This paper proposes to estimate the amplitude envelopes of target sound event classes in polyphonic mixtures, and shows that the envelope estimation allows good modeling of the sounds activity, with detection results comparable to current state of the art.
Using Sequential Information in Polyphonic Sound Event Detection
- Computer Science2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC)
- 2018
This paper proposes to use delayed predictions of event activities as additional input features that are fed back to the neural network, build N-grams to model the co-occurrence probabilities of different events, and use se-quentialloss to train neural networks.
A Comprehensive Review of Polyphonic Sound Event Detection
- Computer ScienceIEEE Access
- 2020
This paper aims to provide an in-depth discussion of different methodologies proposed by various authors that include the features used, detection algorithms, and their corresponding accuracy and limitations.
Augmented Strategy For Polyphonic Sound Event Detection
- Computer Science2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- 2019
An augmented strategy for polyphonic sound event classification that includes data augmentation to enrich training set to eliminate data unbalance, a new loss function that combines cross entropy and F-score, and model fusion to integrate the powers of different classifiers is proposed.
Polyphonic Sound Event Detection with Weak Labeling
- Computer Science
- 2017
This thesis proposes to train deep learning models for SED using various levels of weak labeling, and shows that the sound events can be learned and localized by a recurrent neural network (RNN) with a connectionist temporal classification (CTC) output layer, which is well suited for sequential supervision.
A Framework for the Robust Evaluation of Sound Event Detection
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A new framework for performance evaluation of polyphonic sound event detection (SED) systems is defined, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates and introduces a definition of event detection that is more robust against labelling subjectivity.
Polyphonic Sound Event Tracking Using Linear Dynamical Systems
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
The proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection and tracking and is modeled around a four-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state.
Duration-Controlled LSTM for Polyphonic Sound Event Detection
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
This paper builds upon a state-of-the-art SED method that performs frame-by-frame detection using a bidirectional LSTM recurrent neural network, and incorporates a duration-controlled modeling technique based on a hidden semi-Markov model that makes it possible to model the duration of each sound event precisely and to perform sequence- by-sequence detection without having to resort to thresholding.
References
SHOWING 1-10 OF 52 REFERENCES
Acoustic event detection for multiple overlapping similar sources
- Physics2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- 2015
A simple method modelling the onsets, durations and offsets of acoustic events to avoid intrinsic limits on polyphony or on inter-event temporal patterns is introduced and evaluated in a case study with over 3000 zebra finch calls.
Context-dependent sound event detection
- Computer ScienceEURASIP J. Audio Speech Music. Process.
- 2013
The two-step approach was found to improve the results substantially compared to the context-independent baseline system, and the detection accuracy can be almost doubled by using the proposed context-dependent event detection.
Acoustic event detection in real life recordings
- Physics2010 18th European Signal Processing Conference
- 2010
A system for acoustic event detection in recordings from real life environments using a network of hidden Markov models, capable of recognizing almost one third of the events, and the temporal positioning of the Events is not correct for 84% of the time.
Events Detection for an Audio-Based Surveillance System
- Computer Science2005 IEEE International Conference on Multimedia and Expo
- 2005
The automatic shot detection system presented is based on a novelty detection approach which offers a solution to detect abnormality (abnormal audio events) in continuous audio recordings of public places and takes advantage of potential similarity between the acoustic signatures of the different types of weapons by building a hierarchical classification system.
Reliable detection of audio events in highly noisy environments
- Computer SciencePattern Recognit. Lett.
- 2015
Polyphonic sound event detection using multi label deep neural networks
- Computer Science2015 International Joint Conference on Neural Networks (IJCNN)
- 2015
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Supervised model training for overlapping sound events based on unsupervised source separation
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
Two iterative approaches based on EM algorithm to select the most likely stream to contain the target sound to give a reasonable increase of 8 percentage units in the detection accuracy are proposed.
Acoustic Event Detection and Classification
- PhysicsComputers in the Human Interaction Loop
- 2009
The human activity that takes place in meeting rooms or classrooms is reflected in a rich variety of acoustic events (AE), produced either by the human body or by objects handled by humans, so the…
Sound Event Recognition With Probabilistic Distance SVMs
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2011
The results show that the proposed classification method significantly outperforms conventional SVM classifiers with Mel-frequency cepstral coefficients (MFCCs) and makes the proposed method an obvious choice for online sound event recognition.