Joining Sound Event Detection and Localization Through Spatial Segregation

@article{Trowitzsch2020JoiningSE,
  title={Joining Sound Event Detection and Localization Through Spatial Segregation},
  author={Ivo Trowitzsch and Christopher Schymura and Dorothea Kolossa and Klaus Obermayer},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2020},
  volume={28},
  pages={487-502}
}
Identification and localization of sounds are both integral parts of computational auditory scene analysis. Although each can be solved separately, the goal of forming coherent auditory objects and achieving a comprehensive spatial scene understanding suggests pursuing a joint solution of the two problems. This article presents an approach that robustly binds localization with the detection of sound events in a binaural robotic system. Both tasks are joined through the use of spatial stream… 

Figures and Tables from this paper

Robust sound event detection in binaural computational auditory scene analysis
TLDR
A method for joining sound event detection and source localization is presented by which coherent auditory objects can be created and it is shown that algorithms able to model context over longer durations benefit particularly in demanding scenes and get more precise in their detection.
Joint Direction and Proximity Classification of Overlapping Sound Events from Binaural Audio
TLDR
This paper proposes two methods of splitting the sphere into angular areas in order to obtain a set of directional classes and proposes various ways of combining the proximity and direction estimation problems into a joint task providing temporal information about the onsets and offsets of the appearing sources.
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
TLDR
An overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge, presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems.
Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments
TLDR
A two-stage hierarchical system for SELD of speech in diverse everyday environments, where the audio corresponds to typical usage scenarios of handheld mobile devices, and is evaluated on a database of manually annotated microphone array recordings from various acoustic conditions.
Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization
TLDR
This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model, which yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.
The NIGENS General Sound Events Database
TLDR
NIGENS is released and presented, a database with 714 wav files containing isolated high quality sound events of 14 different types, plus 303 `general' wAV files of anything else but these 14 types.
Sound Event Detection: A tutorial
Imagine standing on a street corner in the city. With your eyes closed you can hear and recognize a succession of sounds: cars passing by, people speaking, their footsteps when they walk by, and the
On Multitask Loss Function for Audio Event Detection and Localization
TLDR
This work proposes a multitask regression model, in which both (multi-label) event detection and localization are formulated as regression problems and use the mean squared error loss homogeneously for model training.
Cooperative abnormal sound event detection in end-edge-cloud orchestrated systems
TLDR
A novel offloading decision-making scheme that leverages hierarchical computational capabilities is proposed to speed up the detection process, avoiding the cumulative latency caused by the increased number of sensors while maintaining high detection accuracy.
AUDIO EVENT DETECTION AND LOCALIZATION WITH MULTITASK REGRESSION NETWORK Technical Report
TLDR
A multitask regression model is proposed, in which both (multi-label) event detection and localization are formulated as regression problems to use the mean squared error loss homogeneously for model training.

References

SHOWING 1-10 OF 56 REFERENCES
Robust Detection of Environmental Sounds in Binaural Auditory Scenes
TLDR
It is demonstrated that by superimposing target sounds with strongly varying general environmental sounds during training, sound type classifiers are less affected by the presence of a distractor source and generalization performance of such models depends on how similar the angular source configuration and the signal-to-noise ratio are to the conditions under which the models were trained.
A Binaural Scene Analyzer for Joint Localization and Recognition of Speakers in the Presence of Interfering Noise Sources and Reverberation
TLDR
A binaural scene analyzer that is able to simultaneously localize, detect and identify a known number of target speakers in the presence of spatially positioned noise sources and reverberation is presented.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks
TLDR
A novel framework for the binaural sound localization that combines the model-based information about the spectral characteristics of sound sources and deep neural networks (DNNs) within a single computational framework is proposed.
Sound-model-based acoustic source localization using distributed microphone arrays
TLDR
A new source localization technique is proposed that works jointly with an acoustic event detection system and it seems that the proposed model-based approach can be an alternative to current techniques for event-based localization.
Detection, classification and localization of acoustic events in the presence of background noise for acoustic surveillance of hazardous situations
TLDR
It is found that the engineered algorithms provide a sufficient robustness in moderately intense noise in order to be applied to practical audio-visual surveillance systems.
Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments
TLDR
A novel machine-hearing system that exploits deep neural networks (DNNs) and head movements for robust binaural localization of multiple sources in reverberant environments and substantially improves localization accuracies under challenging acoustic scenarios, in which multiple talkers and room reverberation are present.
Computational speech segregation based on an auditory-inspired modulation analysis.
  • T. May, T. Dau
  • Physics
    The Journal of the Acoustical Society of America
  • 2014
TLDR
A systematic evaluation of the monaural speech segregation system demonstrates that auditory-inspired modulation processing can substantially improve the mask estimation accuracy in the presence of stationary and fluctuating interferers.
The NIGENS General Sound Events Database
TLDR
NIGENS is released and presented, a database with 714 wav files containing isolated high quality sound events of 14 different types, plus 303 `general' wAV files of anything else but these 14 types.
The what, where and how of auditory-object perception
TLDR
The fundamental perceptual unit in hearing is the 'auditory object', which is the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment.
...
...