Sound event detection using spatial features and convolutional recurrent neural network
@article{Adavanne2017SoundED, title={Sound event detection using spatial features and convolutional recurrent neural network}, author={Sharath Adavanne and Pasi Pertil{\"a} and Tuomas Virtanen}, journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2017}, pages={771-775} }
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a…
100 Citations
Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features
- Computer Science2018 International Joint Conference on Neural Networks (IJCNN)
- 2018
The proposed method learns to recognize overlapping sound events from multichannel features faster and performs better SED with a fewer number of training epochs.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2019
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
A report on sound event detection with different binaural features
- Physics, Computer ScienceArXiv
- 2017
Three different binaural features are studied and evaluated on the publicly available TUT Sound Events 2017 dataset and seen to consistently perform equal to or better than the single-channel features with respect to error rate metric.
Sound Event Detection and Direction of Arrival Estimation using Residual Net and Recurrent Neural Networks
- Computer ScienceDCASE
- 2019
Deep residual nets originally used for image classification are adapted and combined with recurrent neural networks to estimate the onset-offset of sound events, sound events class, and their direction in a reverberant environment to improve the system performance on unseen data.
Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper investigates the effectiveness of dilation operations which provide a CRNN with expanded receptive fields to capture long temporal context without increasing the amount of CRNN’s parameters.
A Deep Neural Network-Driven Feature Learning Method for Polyphonic Acoustic Event Detection from Real-Life Recordings
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
A Deep Neural Network (DNN)-driven feature learning method for polyphonic Acoustic Event Detection (AED) is proposed that outperforms the state-of-the-art methods.
Convolutional Neural Networks with Multi-task Loss for Polyphonic Sound Event Detection
- Computer ScienceCSAE '18
- 2018
A multi-task loss function is used to couple with different neural networks and apply it to a polyphonic sound event detection task and it is compared with DNN, CNN and CBRNN methods.
Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction
- Computer ScienceNeural Comput. Appl.
- 2021
This paper proposes a novel technique that uses only a single feature, namely the Mel-Frequency Cepstral Coefficient and just three layers of CNN, and demonstrates that such a simple network can considerably outperform several conventional and deep learning-based algorithms.
Stacked convolutional and recurrent neural networks for bird audio detection
- Computer Science2017 25th European Signal Processing Conference (EUSIPCO)
- 2017
Data augmentation by blocks mixing and domain adaptation using a novel method of test mixing are proposed and evaluated in regard to making the method robust to unseen data.
Joint Measurement of Multi-channel Sound Event Detection and Localization Using Deep Neural Network
- Computer ScienceJournal of Physics: Conference Series
- 2022
This paper extracts the phase feature and amplitude feature of the sound spectrum from each audio channel, avoiding feature extraction limited by other microphone arrays.
References
SHOWING 1-10 OF 21 REFERENCES
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2017
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
- Computer Science, PhysicsDCASE
- 2016
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Environmental sound classification with convolutional neural networks
- Computer Science2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
- 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
TUT database for acoustic scene classification and sound event detection
- Computer Science, Physics2016 24th European Signal Processing Conference (EUSIPCO)
- 2016
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.
Multichannel Audio Source Separation With Deep Neural Networks
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
This article proposes a framework where deep neural networks are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information and presents its application to a speech enhancement problem.
Metrics for Polyphonic Sound Event Detection
- Computer Science
- 2016
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources…
Audio context recognition using audio event histograms
- Computer Science2010 18th European Signal Processing Conference
- 2010
This paper presents a method for audio context recognition, meaning classification between everyday environments. The method is based on representing each audio context using a histogram of audio…
Acoustic Event Detection: SVM-Based System and Evaluation Setup in CLEAR'07
- PhysicsCLEAR
- 2007
In this paper, the Acoustic Event Detection (AED) system developed at the UPC is described, and its results in the CLEAR evaluations carried out in March 2007 are reported. The system uses a set of…
Audio keyword generation for sports video analysis
- Computer Science, EducationMULTIMEDIA '04
- 2004
This work presents a flexible Hidden Markov Model (HMM)-based audio keyword generation system that treats an audio keyword as a continuous time series data and employs hidden states transition to capture contexts.