Sound event detection using spatial features and convolutional recurrent neural network

  title={Sound event detection using spatial features and convolutional recurrent neural network},
  author={Sharath Adavanne and Pasi Pertil{\"a} and Tuomas Virtanen},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a… 

Figures and Tables from this paper

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features
The proposed method learns to recognize overlapping sound events from multichannel features faster and performs better SED with a fewer number of training epochs.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
A report on sound event detection with different binaural features
Three different binaural features are studied and evaluated on the publicly available TUT Sound Events 2017 dataset and seen to consistently perform equal to or better than the single-channel features with respect to error rate metric.
Sound Event Detection and Direction of Arrival Estimation using Residual Net and Recurrent Neural Networks
Deep residual nets originally used for image classification are adapted and combined with recurrent neural networks to estimate the onset-offset of sound events, sound events class, and their direction in a reverberant environment to improve the system performance on unseen data.
Sound Event Detection Via Dilated Convolutional Recurrent Neural Networks
This paper investigates the effectiveness of dilation operations which provide a CRNN with expanded receptive fields to capture long temporal context without increasing the amount of CRNN’s parameters.
A Deep Neural Network-Driven Feature Learning Method for Polyphonic Acoustic Event Detection from Real-Life Recordings
A Deep Neural Network (DNN)-driven feature learning method for polyphonic Acoustic Event Detection (AED) is proposed that outperforms the state-of-the-art methods.
Convolutional Neural Networks with Multi-task Loss for Polyphonic Sound Event Detection
A multi-task loss function is used to couple with different neural networks and apply it to a polyphonic sound event detection task and it is compared with DNN, CNN and CBRNN methods.
Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction
This paper proposes a novel technique that uses only a single feature, namely the Mel-Frequency Cepstral Coefficient and just three layers of CNN, and demonstrates that such a simple network can considerably outperform several conventional and deep learning-based algorithms.
Stacked convolutional and recurrent neural networks for bird audio detection
Data augmentation by blocks mixing and domain adaptation using a novel method of test mixing are proposed and evaluated in regard to making the method robust to unseen data.
Joint Measurement of Multi-channel Sound Event Detection and Localization Using Deep Neural Network
This paper extracts the phase feature and amplitude feature of the sound spectrum from each audio channel, avoiding feature extraction limited by other microphone arrays.


Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
This work combines these two approaches in a convolutional recurrent neural network (CRNN) and applies it on a polyphonic sound event detection task and observes a considerable improvement for four different datasets consisting of everyday sound events.
Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database and the usage of spatial and harmonic features are shown to improve the performance of SED.
Environmental sound classification with convolutional neural networks
  • Karol J. Piczak
  • Computer Science
    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
TUT database for acoustic scene classification and sound event detection
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments
It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.
Multichannel Audio Source Separation With Deep Neural Networks
This article proposes a framework where deep neural networks are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information and presents its application to a speech enhancement problem.
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Audio context recognition using audio event histograms
This paper presents a method for audio context recognition, meaning classification between everyday environments. The method is based on representing each audio context using a histogram of audio
Acoustic Event Detection: SVM-Based System and Evaluation Setup in CLEAR'07
In this paper, the Acoustic Event Detection (AED) system developed at the UPC is described, and its results in the CLEAR evaluations carried out in March 2007 are reported. The system uses a set of
Audio keyword generation for sports video analysis
This work presents a flexible Hidden Markov Model (HMM)-based audio keyword generation system that treats an audio keyword as a continuous time series data and employs hidden states transition to capture contexts.