Robust sound event detection in binaural computational auditory scene analysis

@inproceedings{Trowitzsch2020RobustSE,
  title={Robust sound event detection in binaural computational auditory scene analysis},
  author={Ivo Trowitzsch},
  year={2020}
}
Automatic sound event detection and computational auditory scene analysis gain importance through the increasing prevalence of technical systems operating autonomously or in the background, since such operation requires awareness of the system’s environment. In realistic scenes, reliable sound event detection, despite the big improvements of the related automatic speech recognition, still poses a difficult problem: general sounds often are less definable than speech and exhibit less… 

References

SHOWING 1-10 OF 255 REFERENCES
Robust Detection of Environmental Sounds in Binaural Auditory Scenes
TLDR
It is demonstrated that by superimposing target sounds with strongly varying general environmental sounds during training, sound type classifiers are less affected by the presence of a distractor source and generalization performance of such models depends on how similar the angular source configuration and the signal-to-noise ratio are to the conditions under which the models were trained.
Joining Sound Event Detection and Localization Through Spatial Segregation
TLDR
This article presents an approach that robustly binds localization with the detection of sound events in a binaural robotic system and demonstrates that the proposed approach is an effective method to obtain joint sound event location and type information under a wide range of conditions.
The NIGENS General Sound Events Database
TLDR
NIGENS is released and presented, a database with 714 wav files containing isolated high quality sound events of 14 different types, plus 303 `general' wAV files of anything else but these 14 types.
Disambiguating Sounds through Context
TLDR
It is shown that the use of knowledge in a dynamic network model can improve automatic sound identification, by reducing the search space of the low-level audio features.
Computational auditory scene analysis
TLDR
A segregation system that is consistent with psychological and physiological findings and significantly better than that of the frame-based segregation scheme described by Meddis and Hewitt (1992).
Continuous robust sound event classification using time-frequency features and deep learning
TLDR
This paper proposes and evaluates a novel Bayesian-inspired front end for the segmentation and detection of continuous sound recordings prior to classification, and benchmarks several high performing isolated sound classifiers to operate with continuous sound data by incorporating an energy-based event detection front end.
Exploiting spectro-temporal locality in deep learning based acoustic event detection
TLDR
Two different feature extraction strategies are explored using multiple resolution spectrograms simultaneously and analyzing the overall and event-wise influence to combine the results, and the use of convolutional neural networks (CNN), a state of the art 2D feature extraction model that exploits local structures, with log power spectrogram input for AED.
Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection
TLDR
This contribution investigates the use of biologically-inspired features, derived from a filterbank of two-dimensional Gabor functions, that decompose the spectro-temporal power density into components which capture spectral, temporal and joint spectro/temporal modulation patterns.
Prediction-driven computational auditory scene analysis
TLDR
A blackboard-based implementation of the 'prediction-driven' approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft.
Detection of overlapping acoustic events using a temporally-constrained probabilistic model
TLDR
Results show that the proposed system outperforms several state-of-the-art methods for overlapping acoustic event detection on the same task, using both frame-based and event-based metrics, and is robust to varying event density and noise levels.
...
1
2
3
4
5
...