Joint Direction and Proximity Classification of Overlapping Sound Events from Binaural Audio

  title={Joint Direction and Proximity Classification of Overlapping Sound Events from Binaural Audio},
  author={Daniel Krause and Archontis Politis and Annamaria Mesaros},
Sound source proximity and distance estimation are of great interest in many practical applications, since they provide significant information for acoustic scene analysis. As both tasks share complementary qualities, ensuring efficient interaction between these two is crucial for a complete picture of an aural environment. In this paper, we aim to investigate several ways of performing joint proximity and direction estimation from binaural recordings, both defined as coarse classification… Expand

Figures and Tables from this paper


Feature Overview for Joint Modeling of Sound Event Detection and Localization Using a Microphone Array
A comprehensive comparison of various state-of-the-art acoustic features such as generalized cross-correlation, and inter-channel level and phase differences, and proposes new features that have not been used for this task before such as eigenvectors of the microphone covariance matrix or sines and cosines of phase differences between the channels. Expand
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios. Expand
A study on distance estimation in binaural sound localization
  • T. Rodemann
  • Engineering, Computer Science
  • 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems
  • 2010
In an extensive experimental setup with more than 10000 sounds, it is found that both mean signal amplitude and binaural cues can, under certain circumstances, provide a very reliable distance estimation. Expand
Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED. Expand
Joining Sound Event Detection and Localization Through Spatial Segregation
This article presents an approach that robustly binds localization with the detection of sound events in a binaural robotic system and demonstrates that the proposed approach is an effective method to obtain joint sound event location and type information under a wide range of conditions. Expand
Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation
Two novel approaches to estimate the direct-to-reverberant energy ratio (DRR) of binaural signals are presented, based on the interaural magnitude-squared coherence and stochastic maximum likelihood beamforming. Expand
Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks
This paper proposes a method to predict the direction (azimuth) and distance of binaural sound sources simultaneously. With the goal of achieving human-like auditory perception in machines, theExpand
Joint estimation of binaural distance and azimuth by exploiting deep neural networks.
Experimental results demonstrate that the proposed method can not only achieve high azimuth estimation accuracy but can also effectively improve the distance estimation accuracy when compared with several state-of-the-art supervised binaural distance estimation methods. Expand
Robust Detection of Environmental Sounds in Binaural Auditory Scenes
It is demonstrated that by superimposing target sounds with strongly varying general environmental sounds during training, sound type classifiers are less affected by the presence of a distractor source and generalization performance of such models depends on how similar the angular source configuration and the signal-to-noise ratio are to the conditions under which the models were trained. Expand
Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation
It is found that simply encoding inter-microphone phase patterns as additional input features during deep clustering provides a significant improvement in separation performance, even with random microphone array geometry. Expand