• Corpus ID: 449184

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

@inproceedings{Adavanne2016SoundED,
  title={Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features},
  author={Sharath Adavanne and Giambattista Parascandolo and Pasi Pertil{\"a} and Toni Heittola and Tuomas Virtanen},
  booktitle={DCASE},
  year={2016}
}
In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task. Real life sound recordings typically have many overlapping sound events, making it hard to recognize with just mono channel audio. Human listeners have been successfully recognizing the mixture of overlapping sound events using pitch cues and exploiting the stereo (multichannel) audio signal available… 

Figures and Tables from this paper

SOUND EVENT DETECTION IN MULTICHANNEL AUDIO LSTM NETWORK
TLDR
A polyphonic sound event detection system that uses log mel-band energy features with long short term memory (LSTM) recurrent neural network to use multichannel audio data and achieves superior performances compared with the baselines.
Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High‐Resolution Spectral Features
TLDR
This paper presents an approach to improve the accuracy of polyphonic sound event detection in multichannel audio based on gated recurrent neural networks in combination with auditory spectral features, and reveals that the proposed method outperforms the conventional approaches.
Tampere University of Technology Sound event detection using spatial features and convolutional recurrent neural network
TLDR
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection and shows that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multich channel audio better when they are presented as separate layers of a volume.
Sound event detection using spatial features and convolutional recurrent neural network
TLDR
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection and shows that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multich channel audio better when they are presented as separate layers of a volume.
Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features
TLDR
The proposed method learns to recognize overlapping sound events from multichannel features faster and performs better SED with a fewer number of training epochs.
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
TLDR
The proposed convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios.
THREE-STAGE APPROACH FOR SOUND EVENT LOCALIZATION AND DETECTION Technical Report
TLDR
This paper describes a three-stage approach system for sound event localization and detection (SELD) task, which employs the multi-resolution cochleagram from 4-channel audio and convolutional recurrent neural network (CRNN) model to detect sound activity.
Polyphonic Sound Event Detection with Weak Labeling
TLDR
This thesis proposes to train deep learning models for SED using various levels of weak labeling, and shows that the sound events can be learned and localized by a recurrent neural network (RNN) with a connectionist temporal classification (CTC) output layer, which is well suited for sequential supervision.
A report on sound event detection with different binaural features
TLDR
Three different binaural features are studied and evaluated on the publicly available TUT Sound Events 2017 dataset and seen to consistently perform equal to or better than the single-channel features with respect to error rate metric.
Robust Polyphonic Sound Event Detection by Using Multi Frame Size Denoising Autoencoder
TLDR
This paper proposes to use denoising autoencoder, which is trained by multi frame size information of audio signals, to extract robust features in a task of polyphonic sound event detection under noisy conditions.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Polyphonic sound event detection using multi label deep neural networks
TLDR
Frame-wise spectral-domain features are used as inputs to train a deep neural network for multi label classification in this work and the proposed method improves the accuracy by 19% percentage points overall.
Recurrent neural networks for polyphonic sound event detection in real life recordings
In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single
TUT database for acoustic scene classification and sound event detection
TLDR
The recording and annotation procedure, the database content, a recommended cross-validation setup and performance of supervised acoustic scene classification system and event detection baseline system using mel frequency cepstral coefficients and Gaussian mixture models are presented.
Environmental Sound Recognition With Time–Frequency Audio Features
TLDR
An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Overlapping sound event recognition using local spectrogram features and the generalised hough transform
Metrics for Polyphonic Sound Event Detection
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources
Detecting sound events in basketball video archive
TLDR
A method for detecting the sound events in a basketball game with focusing on detecting cheering sound and template matching based approach is proposed, which can be used in basketball video content retrieval and highlight extraction.
Acoustic event detection in real life recordings
TLDR
A system for acoustic event detection in recordings from real life environments using a network of hidden Markov models, capable of recognizing almost one third of the events, and the temporal positioning of the Events is not correct for 84% of the time.
Deep beamforming networks for multi-channel speech recognition
TLDR
This work proposes to represent the stages of acoustic processing including beamforming, feature extraction, and acoustic modeling, as three components of a single unified computational network that obtained a 3.2% absolute word error rate reduction compared to a conventional pipeline of independent processing stages.
NMF-based environmental sound source separation using time-variant gain features
...
...